systematic review method example

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Methodology

Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

Formulate a research question
Develop a protocol
Search for all relevant studies
Apply the selection criteria
Extract the data
Synthesize the data
Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

Academic style
Vague sentences
Style consistency

See an example

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
Access to databases and journal archives. Often, your educational institution provides you with access.
Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

They minimize research bias by considering all available evidence and evaluating each study for bias.
Their methods are transparent , so they can be scrutinized by others.
They’re thorough : they summarize all available evidence.
They can be replicated and updated by others.

Systematic reviews also have a few cons .

They’re time-consuming .
They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

Allow you to more effectively communicate your research to other researchers and practitioners
Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

Population(s) or problem(s)
Intervention(s)
Comparison(s)

You can rearrange these four components to write your research question:

What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

Type of study design(s)
The population of patients with eczema
The intervention of probiotics
In comparison to no treatment, placebo , or non-probiotic treatment
The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
Randomized control trials, a type of study design

Their research question was:

What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

Background information : Provide the context of the research question, including why it’s important.
Research objective (s) : Rephrase your research question as an objective.
Selection criteria: State how you’ll decide which studies to include or exclude from your review.
Search strategy: Discuss your plan for finding studies.
Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
Handsearch: Conference proceedings and reference lists of articles
Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

Abstract : A summary of the review
Introduction : Including the rationale and objectives
Methods : Including the selection criteria, search method, data extraction method, and synthesis method
Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
Discussion : Including interpretation of the results and limitations of the review
Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Student’s t -distribution
Normal distribution
Null and Alternative Hypotheses
Chi square tests
Confidence interval
Quartiles & Quantiles
Cluster sampling
Stratified sampling
Data cleansing
Reproducibility vs Replicability
Peer review
Prospective cohort study

Research bias

Implicit bias
Cognitive bias
Placebo effect
Hawthorne effect
Hindsight bias
Affect heuristic
Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other academic texts , with an introduction , a main body, and a conclusion .

An annotated bibliography is a list of source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a paper .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved July 5, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT)

How to write the methods section of a systematic review

Home | Blog | How To | How to write the methods section of a systematic review

Covidence breaks down how to write a methods section

The methods section of your systematic review describes what you did, how you did it, and why. Readers need this information to interpret the results and conclusions of the review. Often, a lot of information needs to be distilled into just a few paragraphs. This can be a challenging task, but good preparation and the right tools will help you to set off in the right direction 🗺️🧭.

Systematic reviews are so-called because they are conducted in a way that is rigorous and replicable. So it’s important that these methods are reported in a way that is thorough, clear, and easy to navigate for the reader – whether that’s a patient, a healthcare worker, or a researcher.

Like most things in a systematic review, the methods should be planned upfront and ideally described in detail in a project plan or protocol. Reviews of healthcare interventions follow the PRISMA guidelines for the minimum set of items to report in the methods section. But what else should be included? It’s a good idea to consider what readers will want to know about the review methods and whether the journal you’re planning to submit the work to has expectations on the reporting of methods. Finding out in advance will help you to plan what to include.

Describe what happened

While the research plan sets out what you intend to do, the methods section is a write-up of what actually happened. It’s not a simple case of rewriting the plan in the past tense – you will also need to discuss and justify deviations from the plan and describe the handling of issues that were unforeseen at the time the plan was written. For this reason, it is useful to make detailed notes before, during, and after the review is completed. Relying on memory alone risks losing valuable information and trawling through emails when the deadline is looming can be frustrating and time consuming!

Keep it brief

The methods section should be succinct but include all the noteworthy information. This can be a difficult balance to achieve. A useful strategy is to aim for a brief description that signposts the reader to a separate section or sections of supporting information. This could include datasets, a flowchart to show what happened to the excluded studies, a collection of search strategies, and tables containing detailed information about the studies.This separation keeps the review short and simple while enabling the reader to drill down to the detail as needed. And if the methods follow a well-known or standard process, it might suffice to say so and give a reference, rather than describe the process at length.

Follow a structure

A clear structure provides focus. Use of descriptive headings keeps the writing on track and helps the reader get to key information quickly. What should the structure of the methods section look like? As always, a lot depends on the type of review but it will certainly contain information relating to the following areas:

Selection criteria ⭕
Data collection and analysis 👩‍💻
Study quality and risk of bias ⚖️

Let’s look at each of these in turn.

1. Selection criteria ⭕

The criteria for including and excluding studies are listed here. This includes detail about the types of studies, the types of participants, the types of interventions and the types of outcomes and how they were measured.

2. Search 🕵🏾‍♀️

Comprehensive reporting of the search is important because this means it can be evaluated and replicated. The search strategies are included in the review, along with details of the databases searched. It’s also important to list any restrictions on the search (for example, language), describe how resources other than electronic databases were searched (for example, non-indexed journals), and give the date that the searches were run. The PRISMA-S extension provides guidance on reporting literature searches.

Systematic reviewer pro-tip:

Copy and paste the search strategy to avoid introducing typos

3. Data collection and analysis 👩‍💻

This section describes:

how studies were selected for inclusion in the review
how study data were extracted from the study reports
how study data were combined for analysis and synthesis

To describe how studies were selected for inclusion , review teams outline the screening process. Covidence uses reviewers’ decision data to automatically populate a PRISMA flow diagram for this purpose. Covidence can also calculate Cohen’s kappa to enable review teams to report the level of agreement among individual reviewers during screening.

To describe how study data were extracted from the study reports , reviewers outline the form that was used, any pilot-testing that was done, and the items that were extracted from the included studies. An important piece of information to include here is the process used to resolve conflict among the reviewers. Covidence’s data extraction tool saves reviewers’ comments and notes in the system as they work. This keeps the information in one place for easy retrieval ⚡.

To describe how study data were combined for analysis and synthesis, reviewers outline the type of synthesis (narrative or quantitative, for example), the methods for grouping data, the challenges that came up, and how these were dealt with. If the review includes a meta-analysis, it will detail how this was performed and how the treatment effects were measured.

4. Study quality and risk of bias ⚖️

Because the results of systematic reviews can be affected by many types of bias, reviewers make every effort to minimise it and to show the reader that the methods they used were appropriate. This section describes the methods used to assess study quality and an assessment of the risk of bias across a range of domains.

Steps to assess the risk of bias in studies include looking at how study participants were assigned to treatment groups and whether patients and/or study assessors were blinded to the treatment given. Reviewers also report their assessment of the risk of bias due to missing outcome data, whether that is due to participant drop-out or non-reporting of the outcomes by the study authors.

Covidence’s default template for assessing study quality is Cochrane’s risk of bias tool but it is also possible to start from scratch and build a tool with a set of custom domains if you prefer.

Careful planning, clear writing, and a structured approach are key to a good methods section. A methodologist will be able to refer review teams to examples of good methods reporting in the literature. Covidence helps reviewers to screen references, extract data and complete risk of bias tables quickly and efficiently. Sign up for a free trial today!

Laura Mellor. Portsmouth, UK

Perhaps you'd also like....

Top 5 Tips for High-Quality Systematic Review Data Extraction

Data extraction can be a complex step in the systematic review process. Here are 5 top tips from our experts to help prepare and achieve high quality data extraction.

How to get through study quality assessment Systematic Review

Find out 5 tops tips to conducting quality assessment and why it’s an important step in the systematic review process.

How to extract study data for your systematic review

Learn the basic process and some tips to build data extraction forms for your systematic review with Covidence.

Better systematic review management

Head office, working for an institution or organisation.

Find out why over 350 of the world’s leading institutions are seeing a surge in publications since using Covidence!

Request a consultation with one of our team members and start empowering your researchers:

By using our site you consent to our use of cookies to measure and improve our site’s performance. Please see our Privacy Policy for more information.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

Knowledge Base
Methodology
Systematic Review | Definition, Examples & Guide

Systematic Review | Definition, Examples & Guide

Published on 15 June 2022 by Shaun Turney . Revised on 17 October 2022.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question ‘What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?’

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

What is a systematic review, systematic review vs meta-analysis, systematic review vs literature review, systematic review vs scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce research bias . The methods are repeatable , and the approach is formal and systematic:

Formulate a research question
Develop a protocol
Search for all relevant studies
Apply the selection criteria
Extract the data
Synthesise the data
Write and publish a report

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesising all available evidence and evaluating the quality of the evidence. Synthesising means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism, run a free check.

Systematic reviews often quantitatively synthesise the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesise results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarise and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimise bias by using transparent and repeatable methods.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
If you’re doing a systematic review on your own (e.g., for a research paper or thesis), you should take appropriate measures to ensure the validity and reliability of your research.
Access to databases and journal archives. Often, your educational institution provides you with access.
Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

They minimise research b ias by considering all available evidence and evaluating each study for bias.
Their methods are transparent , so they can be scrutinised by others.
They’re thorough : they summarise all available evidence.
They can be replicated and updated by others.

Systematic reviews also have a few cons .

They’re time-consuming .
They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

Allow you to more effectively communicate your research to other researchers and practitioners
Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

Population(s) or problem(s)
Intervention(s)
Comparison(s)

You can rearrange these four components to write your research question:

What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fourth component, the type of study design . In this case, the acronym is PICOT .

Type of study design(s)
The population of patients with eczema
The intervention of probiotics
In comparison to no treatment, placebo , or non-probiotic treatment
The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
Randomised control trials, a type of study design

Their research question was:

What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

Background information : Provide the context of the research question, including why it’s important.
Research objective(s) : Rephrase your research question as an objective.
Selection criteria: State how you’ll decide which studies to include or exclude from your review.
Search strategy: Discuss your plan for finding studies.
Analysis: Explain what information you’ll collect from the studies and how you’ll synthesise the data.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
Grey literature: Grey literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of grey literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of grey literature.
Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
Handsearch: Conference proceedings and reference lists of articles
Grey literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

You should apply the selection criteria in two phases:

Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarise what you did using a PRISMA flow diagram .

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
Your judgement of the quality of the evidence, including risk of bias .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomised into the control and treatment groups.

Step 6: Synthesise the data

Synthesising the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesising the data:

Narrative ( qualitative ): Summarise the information in words. You’ll need to discuss the studies and assess their overall quality.
Quantitative : Use statistical methods to summarise and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analysed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

Abstract : A summary of the review
Introduction : Including the rationale and objectives
Methods : Including the selection criteria, search method, data extraction method, and synthesis method
Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
Discussion : Including interpretation of the results and limitations of the review
Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a dissertation , thesis, research paper , or proposal .

There are several reasons to conduct a literature review at the beginning of a research project:

To familiarise yourself with the current state of knowledge on your topic
To ensure that you’re not just repeating what others have already done
To identify gaps in knowledge and unresolved problems that your research can address
To develop your theoretical framework and methodology
To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2022, October 17). Systematic Review | Definition, Examples & Guide. Scribbr. Retrieved 5 July 2024, from https://www.scribbr.co.uk/research-methods/systematic-reviews/

Is this article helpful?

Shaun Turney

Other students also liked, what is a literature review | guide, template, & examples, exploratory research | definition, guide, & examples, what is peer review | types & examples.

A-Z Publications

Annual Review of Psychology

Volume 70, 2019, review article, how to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses.

Andy P. Siddaway 1 , Alex M. Wood 2 , and Larry V. Hedges 3
View Affiliations Hide Affiliations Affiliations: 1 Behavioural Science Centre, Stirling Management School, University of Stirling, Stirling FK9 4LA, United Kingdom; email: [email protected] 2 Department of Psychological and Behavioural Science, London School of Economics and Political Science, London WC2A 2AE, United Kingdom 3 Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; email: [email protected]
Vol. 70:747-770 (Volume publication date January 2019) https://doi.org/10.1146/annurev-psych-010418-102803
First published as a Review in Advance on August 08, 2018
Copyright © 2019 by Annual Reviews. All rights reserved

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.

Article metrics loading...

Full text loading...

Literature Cited

APA Publ. Commun. Board Work. Group J. Artic. Rep. Stand. 2008 . Reporting standards for research in psychology: Why do we need them? What might they be?. Am. Psychol . 63 : 848– 49 [Google Scholar]
Baumeister RF 2013 . Writing a literature review. The Portable Mentor: Expert Guide to a Successful Career in Psychology MJ Prinstein, MD Patterson 119– 32 New York: Springer, 2nd ed.. [Google Scholar]
Baumeister RF , Leary MR 1995 . The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychol. Bull. 117 : 497– 529 [Google Scholar]
Baumeister RF , Leary MR 1997 . Writing narrative literature reviews. Rev. Gen. Psychol. 3 : 311– 20 Presents a thorough and thoughtful guide to conducting narrative reviews. [Google Scholar]
Bem DJ 1995 . Writing a review article for Psychological Bulletin. Psychol . Bull 118 : 172– 77 [Google Scholar]
Borenstein M , Hedges LV , Higgins JPT , Rothstein HR 2009 . Introduction to Meta-Analysis New York: Wiley Presents a comprehensive introduction to meta-analysis. [Google Scholar]
Borenstein M , Higgins JPT , Hedges LV , Rothstein HR 2017 . Basics of meta-analysis: I 2 is not an absolute measure of heterogeneity. Res. Synth. Methods 8 : 5– 18 [Google Scholar]
Braver SL , Thoemmes FJ , Rosenthal R 2014 . Continuously cumulating meta-analysis and replicability. Perspect. Psychol. Sci. 9 : 333– 42 [Google Scholar]
Bushman BJ 1994 . Vote-counting procedures. The Handbook of Research Synthesis H Cooper, LV Hedges 193– 214 New York: Russell Sage Found. [Google Scholar]
Cesario J 2014 . Priming, replication, and the hardest science. Perspect. Psychol. Sci. 9 : 40– 48 [Google Scholar]
Chalmers I 2007 . The lethal consequences of failing to make use of all relevant evidence about the effects of medical treatments: the importance of systematic reviews. Treating Individuals: From Randomised Trials to Personalised Medicine PM Rothwell 37– 58 London: Lancet [Google Scholar]
Cochrane Collab. 2003 . Glossary Rep., Cochrane Collab. London: http://community.cochrane.org/glossary Presents a comprehensive glossary of terms relevant to systematic reviews. [Google Scholar]
Cohn LD , Becker BJ 2003 . How meta-analysis increases statistical power. Psychol. Methods 8 : 243– 53 [Google Scholar]
Cooper HM 2003 . Editorial. Psychol. Bull. 129 : 3– 9 [Google Scholar]
Cooper HM 2016 . Research Synthesis and Meta-Analysis: A Step-by-Step Approach Thousand Oaks, CA: Sage, 5th ed.. Presents a comprehensive introduction to research synthesis and meta-analysis. [Google Scholar]
Cooper HM , Hedges LV , Valentine JC 2009 . The Handbook of Research Synthesis and Meta-Analysis New York: Russell Sage Found, 2nd ed.. [Google Scholar]
Cumming G 2014 . The new statistics: why and how. Psychol. Sci. 25 : 7– 29 Discusses the limitations of null hypothesis significance testing and viable alternative approaches. [Google Scholar]
Earp BD , Trafimow D 2015 . Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6 : 621 [Google Scholar]
Etz A , Vandekerckhove J 2016 . A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11 : e0149794 [Google Scholar]
Ferguson CJ , Brannick MT 2012 . Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychol. Methods 17 : 120– 28 [Google Scholar]
Fleiss JL , Berlin JA 2009 . Effect sizes for dichotomous data. The Handbook of Research Synthesis and Meta-Analysis H Cooper, LV Hedges, JC Valentine 237– 53 New York: Russell Sage Found, 2nd ed.. [Google Scholar]
Garside R 2014 . Should we appraise the quality of qualitative research reports for systematic reviews, and if so, how. Innovation 27 : 67– 79 [Google Scholar]
Hedges LV , Olkin I 1980 . Vote count methods in research synthesis. Psychol. Bull. 88 : 359– 69 [Google Scholar]
Hedges LV , Pigott TD 2001 . The power of statistical tests in meta-analysis. Psychol. Methods 6 : 203– 17 [Google Scholar]
Higgins JPT , Green S 2011 . Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 London: Cochrane Collab. Presents comprehensive and regularly updated guidelines on systematic reviews. [Google Scholar]
John LK , Loewenstein G , Prelec D 2012 . Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 : 524– 32 [Google Scholar]
Juni P , Witschi A , Bloch R , Egger M 1999 . The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282 : 1054– 60 [Google Scholar]
Klein O , Doyen S , Leys C , Magalhães de Saldanha da Gama PA , Miller S et al. 2012 . Low hopes, high expectations: expectancy effects and the replicability of behavioral experiments. Perspect. Psychol. Sci. 7 : 6 572– 84 [Google Scholar]
Lau J , Antman EM , Jimenez-Silva J , Kupelnick B , Mosteller F , Chalmers TC 1992 . Cumulative meta-analysis of therapeutic trials for myocardial infarction. N. Engl. J. Med. 327 : 248– 54 [Google Scholar]
Light RJ , Smith PV 1971 . Accumulating evidence: procedures for resolving contradictions among different research studies. Harvard Educ. Rev. 41 : 429– 71 [Google Scholar]
Lipsey MW , Wilson D 2001 . Practical Meta-Analysis London: Sage Comprehensive and clear explanation of meta-analysis. [Google Scholar]
Matt GE , Cook TD 1994 . Threats to the validity of research synthesis. The Handbook of Research Synthesis H Cooper, LV Hedges 503– 20 New York: Russell Sage Found. [Google Scholar]
Maxwell SE , Lau MY , Howard GS 2015 . Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70 : 487– 98 [Google Scholar]
Moher D , Hopewell S , Schulz KF , Montori V , Gøtzsche PC et al. 2010 . CONSORT explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 340 : c869 [Google Scholar]
Moher D , Liberati A , Tetzlaff J , Altman DG PRISMA Group. 2009 . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339 : 332– 36 Comprehensive reporting guidelines for systematic reviews. [Google Scholar]
Morrison A , Polisena J , Husereau D , Moulton K , Clark M et al. 2012 . The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int. J. Technol. Assess. Health Care 28 : 138– 44 [Google Scholar]
Nelson LD , Simmons J , Simonsohn U 2018 . Psychology's renaissance. Annu. Rev. Psychol. 69 : 511– 34 [Google Scholar]
Noblit GW , Hare RD 1988 . Meta-Ethnography: Synthesizing Qualitative Studies Newbury Park, CA: Sage [Google Scholar]
Olivo SA , Macedo LG , Gadotti IC , Fuentes J , Stanton T , Magee DJ 2008 . Scales to assess the quality of randomized controlled trials: a systematic review. Phys. Ther. 88 : 156– 75 [Google Scholar]
Open Sci. Collab. 2015 . Estimating the reproducibility of psychological science. Science 349 : 943 [Google Scholar]
Paterson BL , Thorne SE , Canam C , Jillings C 2001 . Meta-Study of Qualitative Health Research: A Practical Guide to Meta-Analysis and Meta-Synthesis Thousand Oaks, CA: Sage [Google Scholar]
Patil P , Peng RD , Leek JT 2016 . What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11 : 539– 44 [Google Scholar]
Rosenthal R 1979 . The “file drawer problem” and tolerance for null results. Psychol. Bull. 86 : 638– 41 [Google Scholar]
Rosnow RL , Rosenthal R 1989 . Statistical procedures and the justification of knowledge in psychological science. Am. Psychol. 44 : 1276– 84 [Google Scholar]
Sanderson S , Tatt ID , Higgins JP 2007 . Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int. J. Epidemiol. 36 : 666– 76 [Google Scholar]
Schreiber R , Crooks D , Stern PN 1997 . Qualitative meta-analysis. Completing a Qualitative Project: Details and Dialogue JM Morse 311– 26 Thousand Oaks, CA: Sage [Google Scholar]
Shrout PE , Rodgers JL 2018 . Psychology, science, and knowledge construction: broadening perspectives from the replication crisis. Annu. Rev. Psychol. 69 : 487– 510 [Google Scholar]
Stroebe W , Strack F 2014 . The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9 : 59– 71 [Google Scholar]
Stroup DF , Berlin JA , Morton SC , Olkin I , Williamson GD et al. 2000 . Meta-analysis of observational studies in epidemiology (MOOSE): a proposal for reporting. JAMA 283 : 2008– 12 [Google Scholar]
Thorne S , Jensen L , Kearney MH , Noblit G , Sandelowski M 2004 . Qualitative meta-synthesis: reflections on methodological orientation and ideological agenda. Qual. Health Res. 14 : 1342– 65 [Google Scholar]
Tong A , Flemming K , McInnes E , Oliver S , Craig J 2012 . Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med. Res. Methodol. 12 : 181– 88 [Google Scholar]
Trickey D , Siddaway AP , Meiser-Stedman R , Serpell L , Field AP 2012 . A meta-analysis of risk factors for post-traumatic stress disorder in children and adolescents. Clin. Psychol. Rev. 32 : 122– 38 [Google Scholar]
Valentine JC , Biglan A , Boruch RF , Castro FG , Collins LM et al. 2011 . Replication in prevention science. Prev. Sci. 12 : 103– 17 [Google Scholar]
Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

Systematic Review

Library Help
What is a Systematic Review (SR)?

Steps of a Systematic Review

Framing a Research Question
Developing a Search Strategy
Searching the Literature
Managing the Process
Meta-analysis
Publishing your Systematic Review

Forms and templates

Image: David Parmenter's Shop

PICO Template
Inclusion/Exclusion Criteria
Database Search Log
Review Matrix
Cochrane Tool for Assessing Risk of Bias in Included Studies

• PRISMA Flow Diagram - Record the numbers of retrieved references and included/excluded studies. You can use the Create Flow Diagram tool to automate the process.

• PRISMA Checklist - Checklist of items to include when reporting a systematic review or meta-analysis

PRISMA 2020 and PRISMA-S: Common Questions on Tracking Records and the Flow Diagram

PROSPERO Template
Manuscript Template
Steps of SR (text)
Steps of SR (visual)
Steps of SR (PIECES)

Image by

from the UMB HSHSL Guide. (26 min) on how to conduct and write a systematic review from RMIT University from the VU Amsterdam . , (1), 6–23. https://doi.org/10.3102/0034654319854352

. (1), 49-60. . (4), 471-475.

(2020) (2020) - Methods guide for effectiveness and comparative effectiveness reviews (2017) - Finding what works in health care: Standards for systematic reviews (2011) - Systematic reviews: CRD’s guidance for undertaking reviews in health care (2008)


	entify your research question. Formulate a clear, well-defined research question of appropriate scope. Define your terminology. Find existing reviews on your topic to inform the development of your research question, identify gaps, and confirm that you are not duplicating the efforts of previous reviews. Consider using a framework like or to define you question scope. Use to record search terms under each concept. It is a good idea to register your protocol in a publicly accessible way. This will help avoid other people completing a review on your topic. Similarly, before you start doing a systematic review, it's worth checking the different registries that nobody else has already registered a protocol on the same topic. - Systematic reviews of health care and clinical interventions - Systematic reviews of the effects of social interventions (Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies) - The protocol is published immediately and subjected to open peer review. When two reviewers approve it, the paper is sent to Medline, Embase and other databases for indexing. - upload a protocol for your scoping review - Systematic reviews of healthcare practices to assist in the improvement of healthcare outcomes globally - Registry of a protocol on OSF creates a frozen, time-stamped record of the protocol, thus ensuring a level of transparency and accountability for the research. There are no limits to the types of protocols that can be hosted on OSF. - International prospective register of systematic reviews. This is the primary database for registering systematic review protocols and searching for published protocols. . PROSPERO accepts protocols from all disciplines (e.g., psychology, nutrition) with the stipulation that they must include health-related outcomes. - Similar to PROSPERO. Based in the UK, fee-based service, quick turnaround time. - Submit a pre-print, or a protocol for a scoping review. - Share your search strategy and research protocol. No limit on the format, size, access restrictions or license. outlining the details and documentation necessary for conducting a systematic review: , (1), 28.
	Clearly state the criteria you will use to determine whether or not a study will be included in your search. Consider study populations, study design, intervention types, comparison groups, measured outcomes. Use some database-supplied limits such as language, dates, humans, female/male, age groups, and publication/study types (randomized controlled trials, etc.).
	Run your searches in the to your topic. Work with to help you design comprehensive search strategies across a variety of databases. Approach the grey literature methodically and purposefully. Collect ALL of the retrieved records from each search into , such as , or , and prior to screening. using the and .
- export your Endnote results in this screening software	Start with a title/abstract screening to remove studies that are clearly not related to your topic. Use your to screen the full-text of studies. It is highly recommended that two independent reviewers screen all studies, resolving areas of disagreement by consensus.
	Use , or systematic review software (e.g. , ), to extract all relevant data from each included study. It is recommended that you pilot your data extraction tool, to determine if other fields should be included or existing fields clarified.
Risk of Bias (Quality) Assessment - (download the Excel spreadsheet to see all data)	Use a Risk of Bias tool (such as the ) to assess the potential biases of studies in regards to study design and other factors. Read the to learn about the topic of assessing risk of bias in included studies. You can adapt ( ) to best meet the needs of your review, depending on the types of studies included.
- - -	Clearly present your findings, including detailed methodology (such as search strategies used, selection criteria, etc.) such that your review can be easily updated in the future with new research findings. Perform a meta-analysis, if the studies allow. Provide recommendations for practice and policy-making if sufficient, high quality evidence exists, or future directions for research to fill existing gaps in knowledge or to strengthen the body of evidence. For more information, see: . (2), 217–226. https://doi.org/10.2450/2012.0247-12 - Get some inspiration and find some terms and phrases for writing your manuscript - Automated high-quality spelling, grammar and rephrasing corrections using artificial intelligence (AI) to improve the flow of your writing. Free and subscription plans available.
- -	8. Find the best journal to publish your work. Identifying the best journal to submit your research to can be a difficult process. To help you make the choice of where to submit, simply insert your title and abstract in any of the listed under the tab.

Adapted from A Guide to Conducting Systematic Reviews: Steps in a Systematic Review by Cornell University Library

This diagram illustrates in a visual way and in plain language what review authors actually do in the process of undertaking a systematic review.

This diagram illustrates what is actually in a published systematic review and gives examples from the relevant parts of a systematic review housed online on The Cochrane Library. It will help you to read or navigate a systematic review.

Source: Cochrane Consumers and Communications (infographics are free to use and licensed under Creative Commons )

Check the following visual resources titled " What Are Systematic Reviews?"

Video with closed captions available
Animated Storyboard

Image:

- the methods of the systematic review are generally decided before conducting it.
- searching for studies which match the preset criteria in a systematic manner
- sort all retrieved articles (included or excluded) and assess the risk of bias for each included study
- each study is coded with preset form, either qualitatively or quantitatively synthesize data.
- place results of synthesis into context, strengths and weaknesses of the studies
- report provides description of methods and results in a clear and transparent manner

Source: Foster, M. (2018). Systematic reviews service: Introduction to systematic reviews. Retrieved September 18, 2018, from

<< Previous: What is a Systematic Review (SR)?
Next: Framing a Research Question >>
Last Updated: May 8, 2024 1:44 PM
URL: https://lib.guides.umd.edu/SR

UNC Libraries
HSL Academic Process
Systematic Reviews

Systematic Reviews: Home

Created by health science librarians.

Systematic review resources

What is a Systematic Review?

A simplified process map, how can the library help, publications by hsl librarians, systematic reviews in non-health disciplines, resources for performing systematic reviews.

Step 1: Complete Pre-Review Tasks
Step 2: Develop a Protocol
Step 3: Conduct Literature Searches
Step 4: Manage Citations
Step 5: Screen Citations
Step 6: Assess Quality of Included Studies
Step 7: Extract Data from Included Studies
Step 8: Write the Review

Check our FAQ's

Email us

Call (919) 962-0800

Make an appointment with a librarian

Request a systematic or scoping review consultation

A systematic review is a literature review that gathers all of the available evidence matching pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods, documented in a protocol, to minimize bias , provide reliable findings , and inform decision-making. ¹

There are many types of literature reviews.

Before beginning a systematic review, consider whether it is the best type of review for your question, goals, and resources. The table below compares a few different types of reviews to help you decide which is best for you.

Comparing Systematic, Scoping, and Systematized Reviews
Systematic Review	Scoping Review	Systematized Review
Conducted for Publication	Conducted for Publication	Conducted for Assignment, Thesis, or (Possibly) Publication
Protocol Required	Protocol Required	No Protocol Required
Focused Research Question	Broad Research Question	Either
Focused Inclusion & Exclusion Criteria	Broad Inclusion & Exclusion Criteria	Either
Requires Large Team	Requires Small Team	Usually 1-2 People

Scoping Review Guide For more information about scoping reviews, refer to the UNC HSL Scoping Review Guide.

Systematic Reviews: A Simplified, Step-by-Step Process Map

UNC HSL's Simplified, Step-by-Step Process Map A PDF file of the HSL's Systematic Review Process Map.
Text-Only: UNC HSL's Systematic Reviews - A Simplified, Step-by-Step Process A text-only PDF file of HSL's Systematic Review Process Map.

Creative commons license applied to systematic reviews image requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only.

The average systematic review takes 1,168 hours to complete. ¹ A librarian can help you speed up the process.

Systematic reviews follow established guidelines and best practices to produce high-quality research. Librarian involvement in systematic reviews is based on two levels. In Tier 1, your research team can consult with the librarian as needed. The librarian will answer questions and give you recommendations for tools to use. In Tier 2, the librarian will be an active member of your research team and co-author on your review. Roles and expectations of librarians vary based on the level of involvement desired. Examples of these differences are outlined in the table below.

Roles and expectations of librarians based on level of involvement desired.
Tasks	Tier 1: Consultative	Tier 2: Research Partner / Co-author

Guidance on process and steps	Yes	Yes
Background searching for past and upcoming reviews	Yes	Yes

Development and/or refinement of review topic	Yes	Yes
Assistance with refinement of PICO (population, intervention(s), comparator(s), and key questions	Yes	Yes
Guidance on study types to include	Yes	Yes

Guidance on protocol registration	Yes	Yes

Identification of databases for searches	Yes	Yes
Instruction in search techniques and methods	Yes	Yes
Training in citation management software use for managing and sharing results	Yes	Yes
Development and execution of searches	No	Yes
Downloading search results to citation management software and removing duplicates	No	Yes
Documentation of search strategies	No	Yes
Management of search results	No	Yes

Guidance on methods	Yes	Yes
Guidance on data extraction, and management techniques and software	Yes	Yes

Suggestions of journals to target for publication	Yes	Yes
Drafting of literature search description in "Methods" section	No	Yes
Creation of PRISMA diagram	No	Yes
Drafting of literature search appendix	No	Yes
Review other manuscript sections and final draft	No	Yes
Librarian contributions warrant co-authorship	No	Yes

Request a systematic or scoping review consultation

The following are systematic and scoping reviews co-authored by HSL librarians.

Only the most recent 15 results are listed. Click the website link at the bottom of the list to see all reviews co-authored by HSL librarians in PubMed

Researchers conduct systematic reviews in a variety of disciplines. If your focus is on a topic outside of the health sciences, you may want to also consult the resources below to learn how systematic reviews may vary in your field. You can also contact a librarian for your discipline with questions.

EPPI-Centre methods for conducting systematic reviews The EPPI-Centre develops methods and tools for conducting systematic reviews, including reviews for education, public and social policy.

Environmental Topics

Collaboration for Environmental Evidence (CEE) CEE seeks to promote and deliver evidence syntheses on issues of greatest concern to environmental policy and practice as a public service

Social Sciences

Siddaway AP, Wood AM, Hedges LV. How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-Analyses, and Meta-Syntheses. Annu Rev Psychol. 2019 Jan 4;70:747-770. doi: 10.1146/annurev-psych-010418-102803. A resource for psychology systematic reviews, which also covers qualitative meta-syntheses or meta-ethnographies
The Campbell Collaboration

Social Work

Software engineering

Guidelines for Performing Systematic Literature Reviews in Software Engineering The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students.

Sport, Exercise, & Nutrition

Application of systematic review methodology to the field of nutrition by Tufts Evidence-based Practice Center Publication Date: 2009
Systematic Reviews and Meta-Analysis — Open & Free (Open Learning Initiative) The course follows guidelines and standards developed by the Campbell Collaboration, based on empirical evidence about how to produce the most comprehensive and accurate reviews of research

Systematic Reviews by David Gough, Sandy Oliver & James Thomas Publication Date: 2020

Updating reviews

Updating systematic reviews by University of Ottawa Evidence-based Practice Center Publication Date: 2007
Next: Step 1: Complete Pre-Review Tasks >>
Last Updated: May 16, 2024 3:24 PM
URL: https://guides.lib.unc.edu/systematic-reviews

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings
My Bibliography
Collections
Citation manager

Save citation to file

Email citation, add to collections.

Create a new collection
Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

Search in PubMed
Search in NLM Catalog
Add to Search

Methodology of a systematic review

Affiliations.

1 Hospital Universitario La Paz, Madrid, España. Electronic address: [email protected].
2 Hospital Universitario Fundación Alcorcón, Madrid, España.
3 Instituto Valenciano de Oncología, Valencia, España.
4 Hospital Universitario de Cabueñes, Gijón, Asturias, España.
5 Hospital Universitario Ramón y Cajal, Madrid, España.
6 Hospital Universitario Gregorio Marañón, Madrid, España.
7 Hospital Universitario de Canarias, Tenerife, España.
8 Hospital Clínic, Barcelona, España; EAU Guidelines Office Board Member.
PMID: 29731270
DOI: 10.1016/j.acuro.2018.01.010

Context: The objective of evidence-based medicine is to employ the best scientific information available to apply to clinical practice. Understanding and interpreting the scientific evidence involves understanding the available levels of evidence, where systematic reviews and meta-analyses of clinical trials are at the top of the levels-of-evidence pyramid.

Acquisition of evidence: The review process should be well developed and planned to reduce biases and eliminate irrelevant and low-quality studies. The steps for implementing a systematic review include (i) correctly formulating the clinical question to answer (PICO), (ii) developing a protocol (inclusion and exclusion criteria), (iii) performing a detailed and broad literature search and (iv) screening the abstracts of the studies identified in the search and subsequently of the selected complete texts (PRISMA).

Synthesis of the evidence: Once the studies have been selected, we need to (v) extract the necessary data into a form designed in the protocol to summarise the included studies, (vi) assess the biases of each study, identifying the quality of the available evidence, and (vii) develop tables and text that synthesise the evidence.

Conclusions: A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. To improve scientific writing, the methodology is shown in a structured manner to implement a systematic review.

Keywords: Meta-analysis; Metaanálisis; Methodology; Metodología; Revisión sistemática; Systematic review.

PubMed Disclaimer

LinkOut - more resources

Full text sources.

Elsevier Science

Other Literature Sources

scite Smart Citations

Research Materials

NCI CPTC Antibody Characterization Program
Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Duke NetID Login
919.660.1100
Duke Health Badge: 24-hour access
Accounts & Access
Databases, Journals & Books
Request & Reserve
Training & Consulting
Request Articles & Books
Renew Online
Reserve Spaces
Reserve a Locker
Study & Meeting Rooms
Course Reserves
Pay Fines/Fees
Recommend a Purchase
Access From Off Campus
Building Access
Computers & Equipment
Wifi Access
My Accounts
Mobile Apps
Known Access Issues
Report an Access Issue
All Databases
Article Databases
Basic Sciences
Clinical Sciences
Dissertations & Theses
Drugs, Chemicals & Toxicology
Grants & Funding
Interprofessional Education
Non-Medical Databases
Search for E-Journals
Search for Print & E-Journals
Search for E-Books
Search for Print & E-Books
E-Book Collections
Biostatistics
Global Health
MBS Program
Medical Students
MMCi Program
Occupational Therapy
Path Asst Program
Physical Therapy
Researchers
Community Partners

Conducting Research

Archival & Historical Research
Black History at Duke Health
Data Analytics & Viz Software
Data: Find and Share
Evidence-Based Practice
NIH Public Access Policy Compliance
Publication Metrics
Qualitative Research
Searching Animal Alternatives

Systematic Reviews

Test Instruments

Using Databases

JCR Impact Factors
Web of Science

Finding & Accessing

COVID-19: Core Clinical Resources
Health Literacy
Health Statistics & Data
Library Orientation

Writing & Citing

Creating Links
Getting Published
Reference Mgmt
Scientific Writing

Meet a Librarian

Request a Consultation
Find Your Liaisons
Register for a Class
Request a Class
Self-Paced Learning

Search Services

Literature Search
Systematic Review
Animal Alternatives (IACUC)
Research Impact

Citation Mgmt

Other Software

Scholarly Communications

About Scholarly Communications
Publish Your Work
Measure Your Research Impact
Engage in Open Science
Libraries and Publishers
Directions & Maps
Floor Plans

Library Updates

Annual Snapshot
Conference Presentations
Contact Information
Gifts & Donations
What is a Systematic Review?

Types of Reviews

Manuals and Reporting Guidelines
Our Service
1. Assemble Your Team
2. Develop a Research Question
3. Write and Register a Protocol
4. Search the Evidence
5. Screen Results
6. Assess for Quality and Bias
7. Extract the Data
8. Write the Review
Additional Resources
Finding Full-Text Articles

Review Typologies

There are many types of evidence synthesis projects, including systematic reviews as well as others. The selection of review type is wholly dependent on the research question. Not all research questions are well-suited for systematic reviews.

Review Typologies (from LITR-EX) This site explores different review methodologies such as, systematic, scoping, realist, narrative, state of the art, meta-ethnography, critical, and integrative reviews. The LITR-EX site has a health professions education focus, but the advice and information is widely applicable.

Review the table to peruse review types and associated methodologies. Librarians can also help your team determine which review type might be appropriate for your project.

Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91-108. doi:10.1111/j.1471-1842.2009.00848.x


Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or mode	Seeks to identify most significant items in the field	No formal quality assessment. Attempts to evaluate according to contribution	Typically narrative, perhaps conceptual or chronological	Significant component: seeks to identify conceptual contribution to embody existing or derive new theory
Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings	May or may not include comprehensive searching	May or may not include quality assessment	Typically narrative	Analysis may be chronological, conceptual, thematic, etc.
Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature	Completeness of searching determined by time/scope constraints	No formal quality assessment	May be graphical and tabular	Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research
Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results	Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness	Quality assessment may determine inclusion/ exclusion and/or sensitivity analyses	Graphical and tabular with narrative commentary	Numerical analysis of measures of effect assuming absence of heterogeneity
Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies	Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies	Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists	Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies	Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other
Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics	May or may not include comprehensive searching (depends whether systematic overview or not)	May or may not include quality assessment (depends whether systematic overview or not)	Synthesis depends on whether systematic or not. Typically narrative but may include tabular features	Analysis may be chronological, conceptual, thematic, etc.
Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies	May employ selective or purposive sampling	Quality assessment typically used to mediate messages not for inclusion/exclusion	Qualitative, narrative synthesis	Thematic analysis, may include conceptual models
Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research	Completeness of searching determined by time constraints	Time-limited formal quality assessment	Typically narrative and tabular	Quantities of literature and overall quality/direction of effect of literature
Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research)	Completeness of searching determined by time/scope constraints. May include research in progress	No formal quality assessment	Typically tabular with some narrative commentary	Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review
Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives	Aims for comprehensive searching of current literature	No formal quality assessment	Typically narrative, may have tabular accompaniment	Current state of knowledge and priorities for future investigation and research
Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review	Aims for exhaustive, comprehensive searching	Quality assessment may determine inclusion/exclusion	Typically narrative with tabular accompaniment	What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research
Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’	Aims for exhaustive, comprehensive searching	May or may not include quality assessment	Minimal narrative, tabular summary of studies	What is known; recommendations for practice. Limitations
Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment	May or may not include comprehensive searching	May or may not include quality assessment	Typically narrative with tabular accompaniment	What is known; uncertainty around findings; limitations of methodology
Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results	Identification of component reviews, but no search for primary studies	Quality assessment of studies within component reviews and/or of reviews themselves	Graphical and tabular with narrative commentary	What is known; recommendations for practice. What remains unknown; recommendations for future research

<< Previous: What is a Systematic Review?
Next: Manuals and Reporting Guidelines >>
Last Updated: Jun 18, 2024 9:41 AM
URL: https://guides.mclibrary.duke.edu/sysreview
Duke Health
Duke University
Duke Libraries
Medical Center Archives
Duke Directory
Seeley G. Mudd Building
10 Searle Drive
[email protected]

Open access
Published: 01 August 2019

A step by step guide for conducting a systematic review and meta-analysis with simulation data

Gehad Mohamed Tawfik 1 , 2 ,
Kadek Agus Surya Dila 2 , 3 ,
Muawia Yousif Fadlelmola Mohamed 2 , 4 ,
Dao Ngoc Hien Tam 2 , 5 ,
Nguyen Dang Kien 2 , 6 ,
Ali Mahmoud Ahmed 2 , 7 &
Nguyen Tien Huy 8 , 9 , 10

Tropical Medicine and Health volume 47 , Article number: 46 ( 2019 ) Cite this article

818k Accesses

307 Citations

94 Altmetric

Metrics details

The massive abundance of studies relating to tropical medicine and health has increased strikingly over the last few decades. In the field of tropical medicine and health, a well-conducted systematic review and meta-analysis (SR/MA) is considered a feasible solution for keeping clinicians abreast of current evidence-based medicine. Understanding of SR/MA steps is of paramount importance for its conduction. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, this methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly conduct a SR/MA, in which all the steps here depicts our experience and expertise combined with the already well-known and accepted international guidance.

We suggest that all steps of SR/MA should be done independently by 2–3 reviewers’ discussion, to ensure data quality and accuracy.

SR/MA steps include the development of research question, forming criteria, search strategy, searching databases, protocol registration, title, abstract, full-text screening, manual searching, extracting data, quality assessment, data checking, statistical analysis, double data checking, and manuscript writing.

Introduction

The amount of studies published in the biomedical literature, especially tropical medicine and health, has increased strikingly over the last few decades. This massive abundance of literature makes clinical medicine increasingly complex, and knowledge from various researches is often needed to inform a particular clinical decision. However, available studies are often heterogeneous with regard to their design, operational quality, and subjects under study and may handle the research question in a different way, which adds to the complexity of evidence and conclusion synthesis [ 1 ].

Systematic review and meta-analyses (SR/MAs) have a high level of evidence as represented by the evidence-based pyramid. Therefore, a well-conducted SR/MA is considered a feasible solution in keeping health clinicians ahead regarding contemporary evidence-based medicine.

Differing from a systematic review, unsystematic narrative review tends to be descriptive, in which the authors select frequently articles based on their point of view which leads to its poor quality. A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Furthermore, despite the increasing guidelines for effectively conducting a systematic review, we found that basic steps often start from framing question, then identifying relevant work which consists of criteria development and search for articles, appraise the quality of included studies, summarize the evidence, and interpret the results [ 2 , 3 ]. However, those simple steps are not easy to be reached in reality. There are many troubles that a researcher could be struggled with which has no detailed indication.

Conducting a SR/MA in tropical medicine and health may be difficult especially for young researchers; therefore, understanding of its essential steps is crucial. It is not easy to be done as there are obstacles that could face the researcher. To solve those hindrances, we recommend a flow diagram (Fig. 1 ) which illustrates a detailed and step-by-step the stages for SR/MA studies. This methodology study aimed to provide a step-by-step approach mainly for beginners and junior researchers, in the field of tropical medicine and other health care fields, on how to properly and succinctly conduct a SR/MA; all the steps here depicts our experience and expertise combined with the already well known and accepted international guidance.

Detailed flow diagram guideline for systematic review and meta-analysis steps. Note : Star icon refers to “2–3 reviewers screen independently”

Methods and results

Detailed steps for conducting any systematic review and meta-analysis.

We searched the methods reported in published SR/MA in tropical medicine and other healthcare fields besides the published guidelines like Cochrane guidelines {Higgins, 2011 #7} [ 4 ] to collect the best low-bias method for each step of SR/MA conduction steps. Furthermore, we used guidelines that we apply in studies for all SR/MA steps. We combined these methods in order to conclude and conduct a detailed flow diagram that shows the SR/MA steps how being conducted.

Any SR/MA must follow the widely accepted Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA checklist 2009) (Additional file 5 : Table S1) [ 5 ].

We proposed our methods according to a valid explanatory simulation example choosing the topic of “evaluating safety of Ebola vaccine,” as it is known that Ebola is a very rare tropical disease but fatal. All the explained methods feature the standards followed internationally, with our compiled experience in the conduct of SR beside it, which we think proved some validity. This is a SR under conduct by a couple of researchers teaming in a research group, moreover, as the outbreak of Ebola which took place (2013–2016) in Africa resulted in a significant mortality and morbidity. Furthermore, since there are many published and ongoing trials assessing the safety of Ebola vaccines, we thought this would provide a great opportunity to tackle this hotly debated issue. Moreover, Ebola started to fire again and new fatal outbreak appeared in the Democratic Republic of Congo since August 2018, which caused infection to more than 1000 people according to the World Health Organization, and 629 people have been killed till now. Hence, it is considered the second worst Ebola outbreak, after the first one in West Africa in 2014 , which infected more than 26,000 and killed about 11,300 people along outbreak course.

Research question and objectives

Like other study designs, the research question of SR/MA should be feasible, interesting, novel, ethical, and relevant. Therefore, a clear, logical, and well-defined research question should be formulated. Usually, two common tools are used: PICO or SPIDER. PICO (Population, Intervention, Comparison, Outcome) is used mostly in quantitative evidence synthesis. Authors demonstrated that PICO holds more sensitivity than the more specific SPIDER approach [ 6 ]. SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) was proposed as a method for qualitative and mixed methods search.

We here recommend a combined approach of using either one or both the SPIDER and PICO tools to retrieve a comprehensive search depending on time and resources limitations. When we apply this to our assumed research topic, being of qualitative nature, the use of SPIDER approach is more valid.

PICO is usually used for systematic review and meta-analysis of clinical trial study. For the observational study (without intervention or comparator), in many tropical and epidemiological questions, it is usually enough to use P (Patient) and O (outcome) only to formulate a research question. We must indicate clearly the population (P), then intervention (I) or exposure. Next, it is necessary to compare (C) the indicated intervention with other interventions, i.e., placebo. Finally, we need to clarify which are our relevant outcomes.

To facilitate comprehension, we choose the Ebola virus disease (EVD) as an example. Currently, the vaccine for EVD is being developed and under phase I, II, and III clinical trials; we want to know whether this vaccine is safe and can induce sufficient immunogenicity to the subjects.

An example of a research question for SR/MA based on PICO for this issue is as follows: How is the safety and immunogenicity of Ebola vaccine in human? (P: healthy subjects (human), I: vaccination, C: placebo, O: safety or adverse effects)

Preliminary research and idea validation

We recommend a preliminary search to identify relevant articles, ensure the validity of the proposed idea, avoid duplication of previously addressed questions, and assure that we have enough articles for conducting its analysis. Moreover, themes should focus on relevant and important health-care issues, consider global needs and values, reflect the current science, and be consistent with the adopted review methods. Gaining familiarity with a deep understanding of the study field through relevant videos and discussions is of paramount importance for better retrieval of results. If we ignore this step, our study could be canceled whenever we find out a similar study published before. This means we are wasting our time to deal with a problem that has been tackled for a long time.

To do this, we can start by doing a simple search in PubMed or Google Scholar with search terms Ebola AND vaccine. While doing this step, we identify a systematic review and meta-analysis of determinant factors influencing antibody response from vaccination of Ebola vaccine in non-human primate and human [ 7 ], which is a relevant paper to read to get a deeper insight and identify gaps for better formulation of our research question or purpose. We can still conduct systematic review and meta-analysis of Ebola vaccine because we evaluate safety as a different outcome and different population (only human).

Inclusion and exclusion criteria

Eligibility criteria are based on the PICO approach, study design, and date. Exclusion criteria mostly are unrelated, duplicated, unavailable full texts, or abstract-only papers. These exclusions should be stated in advance to refrain the researcher from bias. The inclusion criteria would be articles with the target patients, investigated interventions, or the comparison between two studied interventions. Briefly, it would be articles which contain information answering our research question. But the most important is that it should be clear and sufficient information, including positive or negative, to answer the question.

For the topic we have chosen, we can make inclusion criteria: (1) any clinical trial evaluating the safety of Ebola vaccine and (2) no restriction regarding country, patient age, race, gender, publication language, and date. Exclusion criteria are as follows: (1) study of Ebola vaccine in non-human subjects or in vitro studies; (2) study with data not reliably extracted, duplicate, or overlapping data; (3) abstract-only papers as preceding papers, conference, editorial, and author response theses and books; (4) articles without available full text available; and (5) case reports, case series, and systematic review studies. The PRISMA flow diagram template that is used in SR/MA studies can be found in Fig. 2 .

PRISMA flow diagram of studies’ screening and selection

Search strategy

A standard search strategy is used in PubMed, then later it is modified according to each specific database to get the best relevant results. The basic search strategy is built based on the research question formulation (i.e., PICO or PICOS). Search strategies are constructed to include free-text terms (e.g., in the title and abstract) and any appropriate subject indexing (e.g., MeSH) expected to retrieve eligible studies, with the help of an expert in the review topic field or an information specialist. Additionally, we advise not to use terms for the Outcomes as their inclusion might hinder the database being searched to retrieve eligible studies because the used outcome is not mentioned obviously in the articles.

The improvement of the search term is made while doing a trial search and looking for another relevant term within each concept from retrieved papers. To search for a clinical trial, we can use these descriptors in PubMed: “clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH terms] OR “clinical trial”[All Fields]. After some rounds of trial and refinement of search term, we formulate the final search term for PubMed as follows: (ebola OR ebola virus OR ebola virus disease OR EVD) AND (vaccine OR vaccination OR vaccinated OR immunization) AND (“clinical trial”[Publication Type] OR “clinical trials as topic”[MeSH Terms] OR “clinical trial”[All Fields]). Because the study for this topic is limited, we do not include outcome term (safety and immunogenicity) in the search term to capture more studies.

Search databases, import all results to a library, and exporting to an excel sheet

According to the AMSTAR guidelines, at least two databases have to be searched in the SR/MA [ 8 ], but as you increase the number of searched databases, you get much yield and more accurate and comprehensive results. The ordering of the databases depends mostly on the review questions; being in a study of clinical trials, you will rely mostly on Cochrane, mRCTs, or International Clinical Trials Registry Platform (ICTRP). Here, we propose 12 databases (PubMed, Scopus, Web of Science, EMBASE, GHL, VHL, Cochrane, Google Scholar, Clinical trials.gov , mRCTs, POPLINE, and SIGLE), which help to cover almost all published articles in tropical medicine and other health-related fields. Among those databases, POPLINE focuses on reproductive health. Researchers should consider to choose relevant database according to the research topic. Some databases do not support the use of Boolean or quotation; otherwise, there are some databases that have special searching way. Therefore, we need to modify the initial search terms for each database to get appreciated results; therefore, manipulation guides for each online database searches are presented in Additional file 5 : Table S2. The detailed search strategy for each database is found in Additional file 5 : Table S3. The search term that we created in PubMed needs customization based on a specific characteristic of the database. An example for Google Scholar advanced search for our topic is as follows:

With all of the words: ebola virus

With at least one of the words: vaccine vaccination vaccinated immunization

Where my words occur: in the title of the article

With all of the words: EVD

Finally, all records are collected into one Endnote library in order to delete duplicates and then to it export into an excel sheet. Using remove duplicating function with two options is mandatory. All references which have (1) the same title and author, and published in the same year, and (2) the same title and author, and published in the same journal, would be deleted. References remaining after this step should be exported to an excel file with essential information for screening. These could be the authors’ names, publication year, journal, DOI, URL link, and abstract.

Protocol writing and registration

Protocol registration at an early stage guarantees transparency in the research process and protects from duplication problems. Besides, it is considered a documented proof of team plan of action, research question, eligibility criteria, intervention/exposure, quality assessment, and pre-analysis plan. It is recommended that researchers send it to the principal investigator (PI) to revise it, then upload it to registry sites. There are many registry sites available for SR/MA like those proposed by Cochrane and Campbell collaborations; however, we recommend registering the protocol into PROSPERO as it is easier. The layout of a protocol template, according to PROSPERO, can be found in Additional file 5 : File S1.

Title and abstract screening

Decisions to select retrieved articles for further assessment are based on eligibility criteria, to minimize the chance of including non-relevant articles. According to the Cochrane guidance, two reviewers are a must to do this step, but as for beginners and junior researchers, this might be tiresome; thus, we propose based on our experience that at least three reviewers should work independently to reduce the chance of error, particularly in teams with a large number of authors to add more scrutiny and ensure proper conduct. Mostly, the quality with three reviewers would be better than two, as two only would have different opinions from each other, so they cannot decide, while the third opinion is crucial. And here are some examples of systematic reviews which we conducted following the same strategy (by a different group of researchers in our research group) and published successfully, and they feature relevant ideas to tropical medicine and disease [ 9 , 10 , 11 ].

In this step, duplications will be removed manually whenever the reviewers find them out. When there is a doubt about an article decision, the team should be inclusive rather than exclusive, until the main leader or PI makes a decision after discussion and consensus. All excluded records should be given exclusion reasons.

Full text downloading and screening

Many search engines provide links for free to access full-text articles. In case not found, we can search in some research websites as ResearchGate, which offer an option of direct full-text request from authors. Additionally, exploring archives of wanted journals, or contacting PI to purchase it if available. Similarly, 2–3 reviewers work independently to decide about included full texts according to eligibility criteria, with reporting exclusion reasons of articles. In case any disagreement has occurred, the final decision has to be made by discussion.

Manual search

One has to exhaust all possibilities to reduce bias by performing an explicit hand-searching for retrieval of reports that may have been dropped from first search [ 12 ]. We apply five methods to make manual searching: searching references from included studies/reviews, contacting authors and experts, and looking at related articles/cited articles in PubMed and Google Scholar.

We describe here three consecutive methods to increase and refine the yield of manual searching: firstly, searching reference lists of included articles; secondly, performing what is known as citation tracking in which the reviewers track all the articles that cite each one of the included articles, and this might involve electronic searching of databases; and thirdly, similar to the citation tracking, we follow all “related to” or “similar” articles. Each of the abovementioned methods can be performed by 2–3 independent reviewers, and all the possible relevant article must undergo further scrutiny against the inclusion criteria, after following the same records yielded from electronic databases, i.e., title/abstract and full-text screening.

We propose an independent reviewing by assigning each member of the teams a “tag” and a distinct method, to compile all the results at the end for comparison of differences and discussion and to maximize the retrieval and minimize the bias. Similarly, the number of included articles has to be stated before addition to the overall included records.

Data extraction and quality assessment

This step entitles data collection from included full-texts in a structured extraction excel sheet, which is previously pilot-tested for extraction using some random studies. We recommend extracting both adjusted and non-adjusted data because it gives the most allowed confounding factor to be used in the analysis by pooling them later [ 13 ]. The process of extraction should be executed by 2–3 independent reviewers. Mostly, the sheet is classified into the study and patient characteristics, outcomes, and quality assessment (QA) tool.

Data presented in graphs should be extracted by software tools such as Web plot digitizer [ 14 ]. Most of the equations that can be used in extraction prior to analysis and estimation of standard deviation (SD) from other variables is found inside Additional file 5 : File S2 with their references as Hozo et al. [ 15 ], Xiang et al. [ 16 ], and Rijkom et al. [ 17 ]. A variety of tools are available for the QA, depending on the design: ROB-2 Cochrane tool for randomized controlled trials [ 18 ] which is presented as Additional file 1 : Figure S1 and Additional file 2 : Figure S2—from a previous published article data—[ 19 ], NIH tool for observational and cross-sectional studies [ 20 ], ROBINS-I tool for non-randomize trials [ 21 ], QUADAS-2 tool for diagnostic studies, QUIPS tool for prognostic studies, CARE tool for case reports, and ToxRtool for in vivo and in vitro studies. We recommend that 2–3 reviewers independently assess the quality of the studies and add to the data extraction form before the inclusion into the analysis to reduce the risk of bias. In the NIH tool for observational studies—cohort and cross-sectional—as in this EBOLA case, to evaluate the risk of bias, reviewers should rate each of the 14 items into dichotomous variables: yes, no, or not applicable. An overall score is calculated by adding all the items scores as yes equals one, while no and NA equals zero. A score will be given for every paper to classify them as poor, fair, or good conducted studies, where a score from 0–5 was considered poor, 6–9 as fair, and 10–14 as good.

In the EBOLA case example above, authors can extract the following information: name of authors, country of patients, year of publication, study design (case report, cohort study, or clinical trial or RCT), sample size, the infected point of time after EBOLA infection, follow-up interval after vaccination time, efficacy, safety, adverse effects after vaccinations, and QA sheet (Additional file 6 : Data S1).

Data checking

Due to the expected human error and bias, we recommend a data checking step, in which every included article is compared with its counterpart in an extraction sheet by evidence photos, to detect mistakes in data. We advise assigning articles to 2–3 independent reviewers, ideally not the ones who performed the extraction of those articles. When resources are limited, each reviewer is assigned a different article than the one he extracted in the previous stage.

Statistical analysis

Investigators use different methods for combining and summarizing findings of included studies. Before analysis, there is an important step called cleaning of data in the extraction sheet, where the analyst organizes extraction sheet data in a form that can be read by analytical software. The analysis consists of 2 types namely qualitative and quantitative analysis. Qualitative analysis mostly describes data in SR studies, while quantitative analysis consists of two main types: MA and network meta-analysis (NMA). Subgroup, sensitivity, cumulative analyses, and meta-regression are appropriate for testing whether the results are consistent or not and investigating the effect of certain confounders on the outcome and finding the best predictors. Publication bias should be assessed to investigate the presence of missing studies which can affect the summary.

To illustrate basic meta-analysis, we provide an imaginary data for the research question about Ebola vaccine safety (in terms of adverse events, 14 days after injection) and immunogenicity (Ebola virus antibodies rise in geometric mean titer, 6 months after injection). Assuming that from searching and data extraction, we decided to do an analysis to evaluate Ebola vaccine “A” safety and immunogenicity. Other Ebola vaccines were not meta-analyzed because of the limited number of studies (instead, it will be included for narrative review). The imaginary data for vaccine safety meta-analysis can be accessed in Additional file 7 : Data S2. To do the meta-analysis, we can use free software, such as RevMan [ 22 ] or R package meta [ 23 ]. In this example, we will use the R package meta. The tutorial of meta package can be accessed through “General Package for Meta-Analysis” tutorial pdf [ 23 ]. The R codes and its guidance for meta-analysis done can be found in Additional file 5 : File S3.

For the analysis, we assume that the study is heterogenous in nature; therefore, we choose a random effect model. We did an analysis on the safety of Ebola vaccine A. From the data table, we can see some adverse events occurring after intramuscular injection of vaccine A to the subject of the study. Suppose that we include six studies that fulfill our inclusion criteria. We can do a meta-analysis for each of the adverse events extracted from the studies, for example, arthralgia, from the results of random effect meta-analysis using the R meta package.

From the results shown in Additional file 3 : Figure S3, we can see that the odds ratio (OR) of arthralgia is 1.06 (0.79; 1.42), p value = 0.71, which means that there is no association between the intramuscular injection of Ebola vaccine A and arthralgia, as the OR is almost one, and besides, the P value is insignificant as it is > 0.05.

In the meta-analysis, we can also visualize the results in a forest plot. It is shown in Fig. 3 an example of a forest plot from the simulated analysis.

Random effect model forest plot for comparison of vaccine A versus placebo

From the forest plot, we can see six studies (A to F) and their respective OR (95% CI). The green box represents the effect size (in this case, OR) of each study. The bigger the box means the study weighted more (i.e., bigger sample size). The blue diamond shape represents the pooled OR of the six studies. We can see the blue diamond cross the vertical line OR = 1, which indicates no significance for the association as the diamond almost equalized in both sides. We can confirm this also from the 95% confidence interval that includes one and the p value > 0.05.

For heterogeneity, we see that I 2 = 0%, which means no heterogeneity is detected; the study is relatively homogenous (it is rare in the real study). To evaluate publication bias related to the meta-analysis of adverse events of arthralgia, we can use the metabias function from the R meta package (Additional file 4 : Figure S4) and visualization using a funnel plot. The results of publication bias are demonstrated in Fig. 4 . We see that the p value associated with this test is 0.74, indicating symmetry of the funnel plot. We can confirm it by looking at the funnel plot.

Publication bias funnel plot for comparison of vaccine A versus placebo

Looking at the funnel plot, the number of studies at the left and right side of the funnel plot is the same; therefore, the plot is symmetry, indicating no publication bias detected.

Sensitivity analysis is a procedure used to discover how different values of an independent variable will influence the significance of a particular dependent variable by removing one study from MA. If all included study p values are < 0.05, hence, removing any study will not change the significant association. It is only performed when there is a significant association, so if the p value of MA done is 0.7—more than one—the sensitivity analysis is not needed for this case study example. If there are 2 studies with p value > 0.05, removing any of the two studies will result in a loss of the significance.

Double data checking

For more assurance on the quality of results, the analyzed data should be rechecked from full-text data by evidence photos, to allow an obvious check for the PI of the study.

Manuscript writing, revision, and submission to a journal

Writing based on four scientific sections: introduction, methods, results, and discussion, mostly with a conclusion. Performing a characteristic table for study and patient characteristics is a mandatory step which can be found as a template in Additional file 5 : Table S3.

After finishing the manuscript writing, characteristics table, and PRISMA flow diagram, the team should send it to the PI to revise it well and reply to his comments and, finally, choose a suitable journal for the manuscript which fits with considerable impact factor and fitting field. We need to pay attention by reading the author guidelines of journals before submitting the manuscript.

The role of evidence-based medicine in biomedical research is rapidly growing. SR/MAs are also increasing in the medical literature. This paper has sought to provide a comprehensive approach to enable reviewers to produce high-quality SR/MAs. We hope that readers could gain general knowledge about how to conduct a SR/MA and have the confidence to perform one, although this kind of study requires complex steps compared to narrative reviews.

Having the basic steps for conduction of MA, there are many advanced steps that are applied for certain specific purposes. One of these steps is meta-regression which is performed to investigate the association of any confounder and the results of the MA. Furthermore, there are other types rather than the standard MA like NMA and MA. In NMA, we investigate the difference between several comparisons when there were not enough data to enable standard meta-analysis. It uses both direct and indirect comparisons to conclude what is the best between the competitors. On the other hand, mega MA or MA of patients tend to summarize the results of independent studies by using its individual subject data. As a more detailed analysis can be done, it is useful in conducting repeated measure analysis and time-to-event analysis. Moreover, it can perform analysis of variance and multiple regression analysis; however, it requires homogenous dataset and it is time-consuming in conduct [ 24 ].

Conclusions

Systematic review/meta-analysis steps include development of research question and its validation, forming criteria, search strategy, searching databases, importing all results to a library and exporting to an excel sheet, protocol writing and registration, title and abstract screening, full-text screening, manual searching, extracting data and assessing its quality, data checking, conducting statistical analysis, double data checking, manuscript writing, revising, and submitting to a journal.

Availability of data and materials

Not applicable.

Abbreviations

Network meta-analysis

Principal investigator

Population, Intervention, Comparison, Outcome

Preferred Reporting Items for Systematic Review and Meta-analysis statement

Quality assessment

Sample, Phenomenon of Interest, Design, Evaluation, Research type

Systematic review and meta-analyses

Bello A, Wiebe N, Garg A, Tonelli M. Evidence-based decision-making 2: systematic reviews and meta-analysis. Methods Mol Biol (Clifton, NJ). 2015;1281:397–416.

Article Google Scholar

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96(3):118–21.

Rys P, Wladysiuk M, Skrzekowska-Baran I, Malecki MT. Review articles, systematic reviews and meta-analyses: which can be trusted? Polskie Archiwum Medycyny Wewnetrznej. 2009;119(3):148–56.

PubMed Google Scholar

Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. 2011.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14:579.

Gross L, Lhomme E, Pasin C, Richert L, Thiebaut R. Ebola vaccine development: systematic review of pre-clinical and clinical studies, and meta-analysis of determinants of antibody response variability after vaccination. Int J Infect Dis. 2018;74:83–96.

Article CAS Google Scholar

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, ... Henry DA. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Giang HTN, Banno K, Minh LHN, Trinh LT, Loc LT, Eltobgy A, et al. Dengue hemophagocytic syndrome: a systematic review and meta-analysis on epidemiology, clinical signs, outcomes, and risk factors. Rev Med Virol. 2018;28(6):e2005.

Morra ME, Altibi AMA, Iqtadar S, Minh LHN, Elawady SS, Hallab A, et al. Definitions for warning signs and signs of severe dengue according to the WHO 2009 classification: systematic review of literature. Rev Med Virol. 2018;28(4):e1979.

Morra ME, Van Thanh L, Kamel MG, Ghazy AA, Altibi AMA, Dat LM, et al. Clinical outcomes of current medical approaches for Middle East respiratory syndrome: a systematic review and meta-analysis. Rev Med Virol. 2018;28(3):e1977.

Vassar M, Atakpo P, Kash MJ. Manual search approaches used by systematic reviewers in dermatology. Journal of the Medical Library Association: JMLA. 2016;104(4):302.

Naunheim MR, Remenschneider AK, Scangas GA, Bunting GW, Deschler DG. The effect of initial tracheoesophageal voice prosthesis size on postoperative complications and voice outcomes. Ann Otol Rhinol Laryngol. 2016;125(6):478–84.

Rohatgi AJaiWa. Web Plot Digitizer. ht tp. 2014;2.

Hozo SP, Djulbegovic B, Hozo I. Estimating the mean and variance from the median, range, and the size of a sample. BMC Med Res Methodol. 2005;5(1):13.

Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14(1):135.

Van Rijkom HM, Truin GJ, Van’t Hof MA. A meta-analysis of clinical studies on the caries-inhibiting effect of fluoride gel treatment. Carries Res. 1998;32(2):83–92.

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Tawfik GM, Tieu TM, Ghozy S, Makram OM, Samuel P, Abdelaal A, et al. Speech efficacy, safety and factors affecting lifetime of voice prostheses in patients with laryngeal cancer: a systematic review and network meta-analysis of randomized controlled trials. J Clin Oncol. 2018;36(15_suppl):e18031-e.

Wannemuehler TJ, Lobo BC, Johnson JD, Deig CR, Ting JY, Gregory RL. Vibratory stimulus reduces in vitro biofilm formation on tracheoesophageal voice prostheses. Laryngoscope. 2016;126(12):2752–7.

Sterne JAC, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355.

RevMan The Cochrane Collaboration %J Copenhagen TNCCTCC. Review Manager (RevMan). 5.0. 2008.

Schwarzer GJRn. meta: An R package for meta-analysis. 2007;7(3):40-45.

Google Scholar

Simms LLH. Meta-analysis versus mega-analysis: is there a difference? Oral budesonide for the maintenance of remission in Crohn’s disease: Faculty of Graduate Studies, University of Western Ontario; 1998.

Download references

Acknowledgements

This study was conducted (in part) at the Joint Usage/Research Center on Tropical Disease, Institute of Tropical Medicine, Nagasaki University, Japan.

Author information

Authors and affiliations.

Faculty of Medicine, Ain Shams University, Cairo, Egypt

Gehad Mohamed Tawfik

Online research Club http://www.onlineresearchclub.org/

Gehad Mohamed Tawfik, Kadek Agus Surya Dila, Muawia Yousif Fadlelmola Mohamed, Dao Ngoc Hien Tam, Nguyen Dang Kien & Ali Mahmoud Ahmed

Pratama Giri Emas Hospital, Singaraja-Amlapura street, Giri Emas village, Sawan subdistrict, Singaraja City, Buleleng, Bali, 81171, Indonesia

Kadek Agus Surya Dila

Faculty of Medicine, University of Khartoum, Khartoum, Sudan

Muawia Yousif Fadlelmola Mohamed

Nanogen Pharmaceutical Biotechnology Joint Stock Company, Ho Chi Minh City, Vietnam

Dao Ngoc Hien Tam

Department of Obstetrics and Gynecology, Thai Binh University of Medicine and Pharmacy, Thai Binh, Vietnam

Nguyen Dang Kien

Faculty of Medicine, Al-Azhar University, Cairo, Egypt

Ali Mahmoud Ahmed

Evidence Based Medicine Research Group & Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Nguyen Tien Huy

Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, 70000, Vietnam

Department of Clinical Product Development, Institute of Tropical Medicine (NEKKEN), Leading Graduate School Program, and Graduate School of Biomedical Sciences, Nagasaki University, 1-12-4 Sakamoto, Nagasaki, 852-8523, Japan

You can also search for this author in PubMed Google Scholar

Contributions

NTH and GMT were responsible for the idea and its design. The figure was done by GMT. All authors contributed to the manuscript writing and approval of the final version.

Corresponding author

Correspondence to Nguyen Tien Huy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Figure S1. Risk of bias assessment graph of included randomized controlled trials. (TIF 20 kb)

Additional file 2:

Figure S2. Risk of bias assessment summary. (TIF 69 kb)

Additional file 3:

Figure S3. Arthralgia results of random effect meta-analysis using R meta package. (TIF 20 kb)

Additional file 4:

Figure S4. Arthralgia linear regression test of funnel plot asymmetry using R meta package. (TIF 13 kb)

Additional file 5:

Table S1. PRISMA 2009 Checklist. Table S2. Manipulation guides for online database searches. Table S3. Detailed search strategy for twelve database searches. Table S4. Baseline characteristics of the patients in the included studies. File S1. PROSPERO protocol template file. File S2. Extraction equations that can be used prior to analysis to get missed variables. File S3. R codes and its guidance for meta-analysis done for comparison between EBOLA vaccine A and placebo. (DOCX 49 kb)

Additional file 6:

Data S1. Extraction and quality assessment data sheets for EBOLA case example. (XLSX 1368 kb)

Additional file 7:

Data S2. Imaginary data for EBOLA case example. (XLSX 10 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Tawfik, G.M., Dila, K.A.S., Mohamed, M.Y.F. et al. A step by step guide for conducting a systematic review and meta-analysis with simulation data. Trop Med Health 47 , 46 (2019). https://doi.org/10.1186/s41182-019-0165-6

Download citation

Received : 30 January 2019

Accepted : 24 May 2019

Published : 01 August 2019

DOI : https://doi.org/10.1186/s41182-019-0165-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Tropical Medicine and Health

ISSN: 1349-4147

Submission enquiries: Access here and click Contact Us
General enquiries: [email protected]

Systematic Reviews

Introduction
Guidelines and procedures
Management tools
Define the question
Check the topic
Determine inclusion/exclusion criteria
Develop a protocol
Identify keywords
Databases and search strategies
Grey literature
Manage and organise
Screen & Select
Locate full text
Extract data

Example reviews

Examples of systematic reviews
Accessing help This link opens in a new window
Systematic Style Reviews Guide This link opens in a new window

Please choose the tab below for your discipline to see relevant examples.

For more information about how to conduct and write reviews, please see the Guidelines section of this guide.

Health & Medicine
Social sciences
Vibration and bubbles: a systematic review of the effects of helicopter retrieval on injured divers. (2018).
Nicotine effects on exercise performance and physiological responses in nicotine‐naïve individuals: a systematic review. (2018).
Association of total white cell count with mortality and major adverse events in patients with peripheral arterial disease: A systematic review. (2014).
Do MOOCs contribute to student equity and social inclusion? A systematic review 2014–18. (2020).
Interventions in Foster Family Care: A Systematic Review. (2020).
Determinants of happiness among healthcare professionals between 2009 and 2019: a systematic review. (2020).
Systematic review of the outcomes and trade-offs of ten types of decarbonization policy instruments. (2021).
A systematic review on Asian's farmers' adaptation practices towards climate change. (2018).
Are concentrations of pollutants in sharks, rays and skates (Elasmobranchii) a cause for concern? A systematic review. (2020).
<< Previous: Write
Next: Publish >>
Last Updated: Jun 24, 2024 1:42 PM
URL: https://libguides.jcu.edu.au/systematic-review

How to Write a Review Article

Types of Review Articles
Before Writing a Review Article
Determining Where to Publish
Searching the Literature
Citation Management
Reading a Review Article

Descriptions of Types of Reviews

Reproduced from: Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies . Health Info Libr J . 2009 Jun;26(2):91-108. doi: 10.1111/j.1471-1842.2009.00848.x. Review. PubMed PMID: 19490148.

Review Type	Description
	Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or model.
	Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings.
	Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature.
	Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results.
	Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies.
	Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics.
	Method for integrating or comparing the findings from qualitative studies. It looks for 'themes' or 'constructs' that lie in or across individual qualitative studies.
	Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research.
	Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research).
	Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives on issue or point out area for further research.
	Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review.
	Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce 'best evidence synthesis'.
	Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment.
	Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results.

A systematic review of experimentally tested implementation strategies across health and human service settings: evidence from 2010-2022

Laura Ellen Ashcraft ORCID: orcid.org/0000-0001-9957-0617 1 , 2 ,
David E. Goodrich 3 , 4 , 5 ,
Joachim Hero 6 ,
Angela Phares 3 ,
Rachel L. Bachrach 7 , 8 ,
Deirdre A. Quinn 3 , 4 ,
Nabeel Qureshi 6 ,
Natalie C. Ernecoff 6 ,
Lisa G. Lederer 5 ,
Leslie Page Scheunemann 9 , 10 ,
Shari S. Rogal 3 , 11 na1 &
Matthew J. Chinman 3 , 4 , 6 na1

Implementation Science volume 19 , Article number: 43 ( 2024 ) Cite this article

1958 Accesses

18 Altmetric

Metrics details

Studies of implementation strategies range in rigor, design, and evaluated outcomes, presenting interpretation challenges for practitioners and researchers. This systematic review aimed to describe the body of research evidence testing implementation strategies across diverse settings and domains, using the Expert Recommendations for Implementing Change (ERIC) taxonomy to classify strategies and the Reach Effectiveness Adoption Implementation and Maintenance (RE-AIM) framework to classify outcomes.

We conducted a systematic review of studies examining implementation strategies from 2010-2022 and registered with PROSPERO (CRD42021235592). We searched databases using terms “implementation strategy”, “intervention”, “bundle”, “support”, and their variants. We also solicited study recommendations from implementation science experts and mined existing systematic reviews. We included studies that quantitatively assessed the impact of at least one implementation strategy to improve health or health care using an outcome that could be mapped to the five evaluation dimensions of RE-AIM. Only studies meeting prespecified methodologic standards were included. We described the characteristics of studies and frequency of implementation strategy use across study arms. We also examined common strategy pairings and cooccurrence with significant outcomes.

Our search resulted in 16,605 studies; 129 met inclusion criteria. Studies tested an average of 6.73 strategies (0-20 range). The most assessed outcomes were Effectiveness ( n =82; 64%) and Implementation ( n =73; 56%). The implementation strategies most frequently occurring in the experimental arm were Distribute Educational Materials ( n =99), Conduct Educational Meetings ( n =96), Audit and Provide Feedback ( n =76), and External Facilitation ( n =59). These strategies were often used in combination. Nineteen implementation strategies were frequently tested and associated with significantly improved outcomes. However, many strategies were not tested sufficiently to draw conclusions.

This review of 129 methodologically rigorous studies built upon prior implementation science data syntheses to identify implementation strategies that had been experimentally tested and summarized their impact on outcomes across diverse outcomes and clinical settings. We present recommendations for improving future similar efforts.

Peer Review reports

Contributions to the literature

While many implementation strategies exist, it has been challenging to compare their effectiveness across a wide range of trial designs and practice settings

This systematic review provides a transdisciplinary evaluation of implementation strategies across population, practice setting, and evidence-based interventions using a standardized taxonomy of strategies and outcomes.

Educational strategies were employed ubiquitously; nineteen other commonly used implementation strategies, including External Facilitation and Audit and Provide Feedback, were associated with positive outcomes in these experimental trials.

This review offers guidance for scholars and practitioners alike in selecting implementation strategies and suggests a roadmap for future evidence generation.

Implementation strategies are “methods or techniques used to enhance the adoption, implementation, and sustainment of evidence-based practices or programs” (EBPs) [ 1 ]. In 2015, the Expert Recommendations for Implementing Change (ERIC) study organized a panel of implementation scientists to compile a standardized set of implementation strategy terms and definitions [ 2 , 3 , 4 ]. These 73 strategies were then organized into nine “clusters” [ 5 ]. The ERIC taxonomy has been widely adopted and further refined [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ]. However, much of the evidence for individual or groups of ERIC strategies remains narrowly focused. Prior systematic reviews and meta-analyses have assessed strategy effectiveness, but have generally focused on a specific strategy, (e.g., Audit and Provide Feedback) [ 14 , 15 , 16 ], subpopulation, disease (e.g., individuals living with dementia) [ 16 ], outcome [ 15 ], service setting (e.g., primary care clinics) [ 17 , 18 , 19 ] or geography [ 20 ]. Given that these strategies are intended to have broad applicability, there remains a need to understand how well implementation strategies work across EBPs and settings and the extent to which implementation knowledge is generalizable.

There are challenges in assessing the evidence of implementation strategies across many EBPs, populations, and settings. Heterogeneity in population characteristics, study designs, methods, and outcomes have made it difficult to quantitatively compare which strategies work and under which conditions [ 21 ]. Moreover, there remains significant variability in how researchers operationalize, apply, and report strategies (individually or in combination) and outcomes [ 21 , 22 ]. Still, synthesizing data related to using individual strategies would help researchers replicate findings and better understand possible mediating factors including the cost, timing, and delivery by specific types of health providers or key partners [ 23 , 24 , 25 ]. Such an evidence base would also aid practitioners with implementation planning such as when and how to deploy a strategy for optimal impact.

Building upon previous efforts, we therefore conducted a systematic review to evaluate the level of evidence supporting the ERIC implementation strategies across a broad array of health and human service settings and outcomes, as organized by the evaluation framework, RE-AIM (Reach, Effectiveness, Adoption, Implementation, Maintenance) [ 26 , 27 , 28 ]. A secondary aim of this work was to identify patterns in scientific reporting of strategy use that could not only inform reporting standards for strategies but also the methods employed in future. The current study was guided by the following research questions Footnote 1 :

What implementation strategies have been most commonly and rigorously tested in health and human service settings?

Which implementation strategies were commonly paired?

What is the evidence supporting commonly tested implementation strategies?

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA-P) model [ 29 , 30 , 31 ] to develop and report on the methods for this systematic review (Additional File 1). This study was considered to be non-human subjects research by the RAND institutional review board.

Registration

The protocol was registered with PROSPERO (PROSPERO 2021 CRD42021235592).

Eligibility criteria

This review sought to synthesize evidence for implementation strategies from research studies conducted across a wide range of health-related settings and populations. Inclusion criteria required studies to: 1) available in English; 2) published between January 1, 2010 and September 20, 2022; 3) based on experimental research (excluded protocols, commentaries, conference abstracts, or proposed frameworks); 4) set in a health or human service context (described below); 5) tested at least one quantitative outcome that could be mapped to the RE-AIM evaluation framework [ 26 , 27 , 28 ]; and 6) evaluated the impact of an implementation strategy that could be classified using the ERIC taxonomy [ 2 , 32 ]. We defined health and human service setting broadly, including inpatient and outpatient healthcare settings, specialty clinics, mental health treatment centers, long-term care facilities, group homes, correctional facilities, child welfare or youth services, aging services, and schools, and required that the focus be on a health outcome. We excluded hybrid type I trials that primarily focused on establishing EBP effectiveness, qualitative studies, studies that described implementation barriers and facilitators without assessing implementation strategy impact on an outcome, and studies not meeting standardized rigor criteria defined below.

Information sources

Our three-pronged search strategy included searching academic databases (i.e., CINAHL, PubMed, and Web of Science for replicability and transparency), seeking recommendations from expert implementation scientists, and assessing existing, relevant systematic reviews and meta-analyses.

Search strategy

Search terms included “implementation strateg*” OR “implementation intervention*” OR “implementation bundl*” OR “implementation support*.” The search, conducted on September 20, 2022, was limited to English language and publication between 2010 and 2022, similar to other recent implementation science reviews [ 22 ]. This timeframe was selected to coincide with the advent of Implementation Science and when the term “implementation strategy” became conventionally used [ 2 , 4 , 33 ]. A full search strategy can be found in Additional File 2.

Title and abstract screening process

Each study’s title and abstract were read by two reviewers, who dichotomously scored studies on each of the six eligibility criteria described above as yes=1 or no=0, resulting in a score ranging from 1 to 6. Abstracts receiving a six from both reviewers were included in the full text review. Those with only one score of six were adjudicated by a senior member of the team (MJC, SSR, DEG). The study team held weekly meetings to troubleshoot and resolve any ongoing issues noted through the abstract screening process.

Full text screening

During the full text screening process, we reviewed, in pairs, each article that had progressed through abstract screening. Conflicts between reviewers were adjudicated by a senior member of the team for a final inclusion decision (MJC, SSR, DEG).

Review of study rigor

After reviewing published rigor screening tools [ 34 , 35 , 36 ], we developed an assessment of study rigor that was appropriate for the broad range of reviewed implementation studies. Reviewers evaluated studies on the following: 1) presence of a concurrent comparison or control group (=2 for traditional randomized controlled trial or stepped wedge cluster randomized trial and =1 for pseudo-randomized and other studies with concurrent control); 2) EBP standardization by protocol or manual (=1 if present); 3) EBP fidelity tracking (=1 if present); 4) implementation strategy standardization by operational description, standard training, or manual (=1 if present); 5) length of follow-up from full implementation of intervention (=2 for twelve months or longer, =1 for six to eleven months, or =0 for less than six months); and 6) number of sites (=1 for more than one site). Rigor scores ranged from 0 to 8, with 8 indicating the most rigorous. Articles were included if they 1) included a concurrent control group, 2) had an experimental design, and 3) received a score of 7 or 8 from two independent reviewers.

Outside expert consultation

We contacted 37 global implementation science experts who were recognized by our study team as leaders in the field or who were commonly represented among first or senior authors in the included abstracts. We asked each expert for recommendations of publications meeting study inclusion criteria (i.e., quantitatively evaluating the effectiveness of an implementation strategy). Recommendations were recorded and compared to the full abstract list.

Systematic reviews

Eighty-four systematic reviews were identified through the initial search strategy (See Additional File 3). Systematic reviews that examined the effectiveness of implementation strategies were reviewed in pairs for studies that were not found through our initial literature search.

Data abstraction and coding

Data from the full text review were abstracted in pairs, with conflicts resolved by senior team members (DEG, MJC) using a standard Qualtrics abstraction form. The form captured the setting, number of sites and participants studied, evidence-based practice/program of focus, outcomes assessed (based on RE-AIM), strategies used in each study arm, whether the study took place in the U.S. or outside of the U.S., and the findings (i.e., was there significant improvement in the outcome(s)?). We coded implementation strategies used in the Control and Experimental Arms. We defined the Control Arm as receiving the lowest number of strategies (which could mean zero strategies or care as usual) and the Experimental Arm as the most intensive arm (i.e., receiving the highest number of strategies). When studies included multiple Experimental Arms, the Experimental Arm with the least intensive implementation strategy(ies) was classified as “Control” and the Experimental Arm with the most intensive implementation strategy(ies) was classified as the “Experimental” Arm.

Implementation strategies were classified using standard definitions (MJC, SSR, DEG), based on minor modifications to the ERIC taxonomy [ 2 , 3 , 4 ]. Modifications resulted in 70 named strategies and were made to decrease redundancy and improve clarity. These modifications were based on input from experts, cognitive interview data, and team consensus [ 37 ] (See Additional File 4). Outcomes were then coded into RE-AIM outcome domains following best practices as recommended by framework experts [ 26 , 27 , 28 ]. We coded the RE-AIM domain of Effectiveness as either an assessment of the effectiveness of the EBP or the implementation strategy. We did not assess implementation strategy fidelity or effects on health disparities as these are recently adopted reporting standards [ 27 , 28 ] and not yet widely implemented in current publications. Further, we did not include implementation costs as an outcome because reporting guidelines have not been standardized [ 38 , 39 ].

Assessment and minimization of bias

Assessment and minimization of bias is an important component of high-quality systematic reviews. The Cochrane Collaboration guidance for conducting high-quality systematic reviews recommends including a specific assessment of bias for individual studies by assessing the domains of randomization, deviations of intended intervention, missing data, measurement of the outcome, and selection of the reported results (e.g., following a pre-specified analysis plan) [ 40 , 41 ]. One way we addressed bias was by consolidating multiple publications from the same study into a single finding (i.e., N =1), so-as to avoid inflating estimates due to multiple publications on different aspects of a single trial. We also included high-quality studies only, as described above. However, it was not feasible to consistently apply an assessment of bias tool due to implementation science’s broad scope and the heterogeneity of study design, context, outcomes, and variable measurement, etc. For example, most implementation studies reviewed had many outcomes across the RE-AIM framework, with no one outcome designated as primary, precluding assignment of a single score across studies.

We used descriptive statistics to present the distribution of health or healthcare area, settings, outcomes, and the median number of included patients and sites per study, overall and by country (classified as U.S. vs. non-U.S.). Implementation strategies were described individually, using descriptive statistics to summarize the frequency of strategy use “overall” (in any study arm), and the mean number of strategies reported in the Control and Experimental Arms. We additionally described the strategies that were only in the experimental (and not control) arm, defining these as strategies that were “tested” and may have accounted for differences in outcomes between arms.

We described frequencies of pair-wise combinations of implementation strategies in the Experimental Arm. To assess the strength of the evidence supporting implementation strategies that were used in the Experimental Arm, study outcomes were categorized by RE-AIM and coded based on whether the association between use of the strategies resulted in a significantly positive effect (yes=1; no=0). We then created an indicator variable if at least one RE-AIM outcome in the study was significantly positive (yes=1; no=0). We plotted strategies on a graph with quadrants based on the combination of median number of studies in which a strategy appears and the median percent of studies in which a strategy was associated with at least one positive RE-AIM outcome. The upper right quadrant—higher number of studies overall and higher percent of studies with a significant RE-AIM outcome—represents a superior level of evidence. For implementation strategies in the upper right quadrant, we describe each RE-AIM outcome and the proportion of studies which have a significant outcome.

Search results

We identified 14,646 articles through the initial literature search, 17 articles through expert recommendation (three of which were not included in the initial search), and 1,942 articles through reviewing prior systematic reviews (Fig. 1 ). After removing duplicates, 9,399 articles were included in the initial abstract screening. Of those, 48% ( n =4,075) abstracts were reviewed in pairs for inclusion. Articles with a score of five or six were reviewed a second time ( n =2,859). One quarter of abstracts that scored lower than five were reviewed for a second time at random. We screened the full text of 1,426 articles in pairs. Common reasons for exclusion were 1) study rigor, including no clear delineation between the EBP and implementation strategy, 2) not testing an implementation strategy, and 3) article type that did not meet inclusion criteria (e.g., commentary, protocol, etc.). Six hundred seventeen articles were reviewed for study rigor with 385 excluded for reasons related to study design and rigor, and 86 removed for other reasons (e.g., not a research article). Among the three additional expert-recommended articles, one met inclusion criteria and was added to the analysis. The final number of studies abstracted was 129 representing 143 publications.

Expanded PRISMA Flow Diagram

The expanded PRISMA flow diagram provides a description of each step in the review and abstraction process for the systematic review

Descriptive results

Of 129 included studies (Table 1 ; see also Additional File 5 for Summary of Included Studies), 103 (79%) were conducted in a healthcare setting. EBP health care setting varied and included primary care ( n =46; 36%), specialty care ( n =27; 21%), mental health ( n =11; 9%), and public health ( n =30; 23%), with 64 studies (50%) occurring in an outpatient health care setting. Studies included a median of 29 sites and 1,419 target population (e.g., patients or students). The number of strategies varied widely across studies, with Control Arms averaging approximately two strategies (Range = 0-20, including studies with no strategy in the comparison group) and Experimental Arms averaging eight strategies (Range = 1-21). Non-US studies ( n =73) included more sites and target population on average, with an overall median of 32 sites and 1,531 patients assessed in each study.

Organized by RE-AIM, the most evaluated outcomes were Effectiveness ( n = 82, 64%) and Implementation ( n = 73, 56%); followed by Maintenance ( n =40; 31%), Adoption ( n =33; 26%), and Reach ( n =31; 24%). Most studies ( n = 98, 76%) reported at least one significantly positive outcome. Adoption and Implementation outcomes showed positive change in three-quarters of studies ( n =78), while Reach ( n =18; 58%), Effectiveness ( n =44; 54%), and Maintenance ( n =23; 58%) outcomes evidenced positive change in approximately half of studies.

The following describes the results for each research question.

Table 2 shows the frequency of studies within which an implementation strategy was used in the Control Arm, Experimental Arm(s), and tested strategies (those used exclusively in the Experimental Arm) grouped by strategy type, as specified by previous ERIC reports [ 2 , 6 ].

Control arm

In about half the studies (53%; n =69), the Control Arms were “active controls” that included at least one strategy, with an average of 1.64 (and up to 20) strategies reported in control arms. The two most common strategies used in Control Arms were: Distribute Educational Materials ( n =52) and Conduct Educational Meetings ( n =30).

Experimental arm

Experimental conditions included an average of 8.33 implementation strategies per study (Range = 1-21). Figure 2 shows a heat map of the strategies that were used in the Experimental Arms in each study. The most common strategies in the Experimental Arm were Distribute Educational Materials ( n =99), Conduct Educational Meetings ( n =96), Audit and Provide Feedback ( n =76), and External Facilitation ( n =59).

Implementation strategies used in the Experimental Arm of included studies. Explore more here: https://public.tableau.com/views/Figure2_16947070561090/Figure2?:language=en-US&:display_count=n&:origin=viz_share_link

Tested strategies

The average number of implementation strategies that were included in the Experimental Arm only (and not in the Control Arm) was 6.73 (Range = 0-20). Footnote 2 Overall, the top 10% of tested strategies included Conduct Educational Meetings ( n =68), Audit and Provide Feedback ( n =63), External Facilitation ( n =54), Distribute Educational Materials ( n =49), Tailor Strategies ( n =41), Assess for Readiness and Identify Barriers and Facilitators ( n =38) and Organize Clinician Implementation Team Meetings ( n =37). Few studies tested a single strategy ( n =9). These strategies included, Audit and Provide Feedback, Conduct Educational Meetings, Conduct Ongoing Training, Create a Learning Collaborative, External Facilitation ( n =2), Facilitate Relay of Clinical Data To Providers, Prepare Patients/Consumers to be Active Participants, and Use Other Payment Schemes. Three implementation strategies were included in the Control or Experimental Arms but were not Tested including, Use Mass Media, Stage Implementation Scale Up, and Fund and Contract for the Clinical Innovation.

Table 3 shows the five most used strategies in Experimental Arms with their top ten most frequent pairings, excluding Distribute Educational Materials and Conduct Educational Meetings, as these strategies were included in almost all Experimental and half of Control Arms. The five most used strategies in the Experimental Arm included Audit and Provide Feedback ( n =76), External Facilitation ( n =59), Tailor Strategies ( n =43), Assess for Readiness and Identify Barriers and Facilitators ( n =43), and Organize Implementation Teams ( n =42).

Strategies frequently paired with these five strategies included two educational strategies: Distribute Educational Materials and Conduct Educational Meetings. Other commonly paired strategies included Develop a Formal Implementation Blueprint, Promote Adaptability, Conduct Ongoing Training, Purposefully Reexamine the Implementation, and Develop and Implement Tools for Quality Monitoring.

We classified the strength of evidence for each strategy by evaluating both the number of studies in which each strategy appeared in the Experimental Arm and the percentage of times there was at least one significantly positive RE-AIM outcome. Using these factors, Fig. 3 shows the number of studies in which individual strategies were evaluated (on the y axis) compared to the percentage of times that studies including those strategies had at least one positive outcome (on the x axis). Due to the non-normal distribution of both factors, we used the median (rather than the mean) to create four quadrants. Strategies in the lower left quadrant were tested in fewer than the median number of studies (8.5) and were less frequently associated with a significant RE-AIM outcome (75%). The upper right quadrant included strategies that occurred in more than the median number of studies (8.5) and had more than the median percent of studies with a significant RE-AIM outcome (75%); thus those 19 strategies were viewed as having stronger evidence. Of those 19 implementation strategies, Conduct Educational Meetings, Distribute Educational Materials, External Facilitation, and Audit and Provide Feedback continued to occur frequently, appearing in 59-99 studies.

Experimental Arm Implementation Strategies with significant RE-AIM outcome. Explore more here: https://public.tableau.com/views/Figure3_16947017936500/Figure3?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

Figure 4 graphically illustrates the proportion of significant outcomes for each RE-AIM outcome for the 19 commonly used and evidence-based implementation strategies in the upper right quadrant. These findings again show the widespread use of Conduct Educational Meetings and Distribute Educational Materials. Implementation and Effectiveness outcomes were assessed most frequently, with Implementation being the mostly commonly reported significantly positive outcome.

RE-AIM outcomes for the 19 Top-Right Quadrant Implementation Strategies . The y-axis is the number of studies and the x-axis is a stacked bar chart for each RE-AIM outcome with R=Reach, E=Effectiveness, A=Adoption, I=Implementation, M=Maintenance. Blue denotes at least one significant RE-AIM outcome; Light blue denotes studies which used the given implementation strategy and did not have a significant RE-AIM . Explore more here: https://public.tableau.com/views/Figure4_16947017112150/Figure4?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

This systematic review identified 129 experimental studies examining the effectiveness of implementation strategies across a broad range of health and human service studies. Overall, we found that evidence is lacking for most ERIC implementation strategies, that most studies employed combinations of strategies, and that implementation outcomes, categorized by RE-AIM dimensions, have not been universally defined or applied. Accordingly, other researchers have described the need for universal outcomes definitions and descriptions across implementation research studies [ 28 , 42 ]. Our findings have important implications not only for the current state of the field but also for creating guidance to help investigators determine which strategies and in what context to examine.

The four most evaluated strategies were Distribute Educational Materials, Conduct Educational Meetings, External Facilitation, and Audit and Provide Feedback. Conducting Educational Meetings and Distributing Educational Materials were surprisingly the most common. This may reflect the fact that education strategies are generally considered to be “necessary but not sufficient” for successful implementation [ 43 , 44 ]. Because education is often embedded in interventions, it is critical to define the boundary between the innovation and the implementation strategies used to support the innovation. Further specification as to when these strategies are EBP core components or implementation strategies (e.g., booster trainings or remediation) is needed [ 45 , 46 ].

We identified 19 implementation strategies that were tested in at least 8 studies (more than the median) and were associated with positive results at least 75% of the time. These strategies can be further categorized as being used in early or pre-implementation versus later in implementation. Preparatory activities or pre-implementation, strategies that had strong evidence included educational activities (Meetings, Materials, Outreach visits, Train for Leadership, Use Train the Trainer Strategies) and site diagnostic activities (Assess for Readiness, Identify Barriers and Facilitators, Conduct Local Needs Assessment, Identify and Prepare Champions, and Assess and Redesign Workflows). Strategies that target the implementation phase include those that provide coaching and support (External and Internal Facilitation), involve additional key partners (Intervene with Patients to Enhance Uptake and Adherence), and engage in quality improvement activities (Audit and Provide Feedback, Facilitate the Relay of Clinical Data to Providers, Purposefully Reexamine the Implementation, Conduct Cyclical Small Tests of Change, Develop and Implement Tools for Quality Monitoring).

There were many ERIC strategies that were not represented in the reviewed studies, specifically the financial and policy strategies. Ten strategies were not used in any studies, including: Alter Patient/Consumer Fees, Change Liability Laws, Change Service Sites, Develop Disincentives, Develop Resource Sharing Agreements, Identify Early Adopters, Make Billing Easier, Start a Dissemination Organization, Use Capitated Payments, and Use Data Experts. One of the limitations of this investigation was that not all individual strategies or combinations were investigated. Reasons for the absence of these strategies in our review may include challenges with testing certain strategies experimentally (e.g., changing liability laws), limitations in our search terms, and the relative paucity of implementation strategy trials compared to clinical trials. Many “untested” strategies require large-scale structural changes with leadership support (see [ 47 ] for policy experiment example). Recent preliminary work has assessed the feasibility of applying policy strategies and described the challenges with doing so [ 48 , 49 , 50 ]. While not impossible in large systems like VA (for example: the randomized evaluation of the VA Stratification Tool for Opioid Risk Management) the large size, structure, and organizational imperative makes these initiatives challenging to experimentally evaluate. Likewise, the absence of these ten strategies may have been the result of our inclusion criteria, which required an experimental design. Thus, creative study designs may be needed to test high-level policy or financial strategies experimentally.

Some strategies that were likely under-represented in our search strategy included electronic medical record reminders and clinical decision support tools and systems. These are often considered “interventions” when used by clinical trialists and may not be indexed as studies involving ‘implementation strategies’ (these tools have been reviewed elsewhere [ 51 , 52 , 53 ]). Thus, strategies that are also considered interventions in the literature (e.g., education interventions) were not sought or captured. Our findings do not imply that these strategies are ineffective, rather that more study is needed. Consistent with prior investigations [ 54 ], few studies meeting inclusion criteria tested financial strategies. Accordingly, there are increasing calls to track and monitor the effects of financial strategies within implementation science to understand their effectiveness in practice [ 55 , 56 ]. However, experts have noted that the study of financial strategies can be a challenge given that they are typically implemented at the system-level and necessitate research designs for studying policy-effects (e.g., quasi-experimental methods, systems-science modeling methods) [ 57 ]. Yet, there have been some recent efforts to use financial strategies to support EBPs that appear promising [ 58 ] and could be a model for the field moving forward.

The relationship between the number of strategies used and improved outcomes has been described inconsistently in the literature. While some studies have found improved outcomes with a bundle of strategies that were uniquely combined or a standardized package of strategies (e.g., Replicating Effective Programs [ 59 , 60 ] and Getting To Outcomes [ 61 , 62 ]), others have found that “more is not always better” [ 63 , 64 , 65 ]. For example, Rogal and colleagues documented that VA hospitals implementing a new evidence-based hepatitis C treatment chose >20 strategies, when multiple years of data linking strategies to outcomes showed that 1-3 specific strategies would have yielded the same outcome [ 39 ]. Considering that most studies employed multiple or multifaceted strategies, it seems that there is a benefit of using a targeted bundle of strategies that are purposefully aligns with site/clinic/population norms, rather than simply adding more strategies [ 66 ].

It is difficult to assess the effectiveness of any one implementation strategy in bundles where multiple strategies are used simultaneously. Even a ‘single’ strategy like External Facilitation is, in actuality, a bundle of narrowly constructed strategies (e.g., Conduct Educational Meetings, Identify and Prepare Champions, and Develop a Formal Implementation Blueprint). Thus, studying External Facilitation does not allow for a test of the individual strategies that comprise it, potentially masking the effectiveness of any individual strategy. While we cannot easily disaggregate the effects of multifaceted strategies, doing so may not yield meaningful results. Because strategies often synergize, disaggregated results could either underestimate the true impact of individual strategies or conversely, actually undermine their effectiveness (i.e., when their effectiveness comes from their combination with other strategies). The complexity of health and human service settings, imperative to improve public health outcomes, and engagement with community partners often requires the use of multiple strategies simultaneously. Therefore, the need to improve real-world implementation may outweigh the theoretical need to identify individual strategy effectiveness. In situations where it would be useful to isolate the impact of single strategies, we suggest that the same methods for documenting and analyzing the critical components (or core functions) of complex interventions [ 67 , 68 , 69 , 70 ] may help to identify core components of multifaceted implementation strategies [ 71 , 72 , 73 , 74 ].

In addition, to truly assess the impacts of strategies on outcomes, it may be necessary to track fidelity to implementation strategies (not just the EBPs they support). While this can be challenging, without some degree of tracking and fidelity checks, one cannot determine whether a strategy’s apparent failure to work was because it 1) was ineffective or 2) was not applied well. To facilitate this tracking there are pragmatic tools to support researchers. For example, the Longitudinal Implementation Strategy Tracking System (LISTS) offers a pragmatic and feasible means to assess fidelity to and adaptations of strategies [ 75 ].

Implications for implementation science: four recommendations

Based on our findings, we offer four recommended “best practices” for implementation studies.

Prespecify strategies using standard nomenclature. This study reaffirmed the need to apply not only a standard naming convention (e.g., ERIC) but also a standard reporting of for implementation strategies. While reporting systems like those by Proctor [ 1 ] or Pinnock [ 75 ] would optimize learning across studies, few manuscripts specify strategies as recommended [ 76 , 77 ]. Pre-specification allows planners and evaluators to assess the feasibility and acceptability of strategies with partners and community members [ 24 , 78 , 79 ] and allows evaluators and implementers to monitor and measure the fidelity, dose, and adaptations to strategies delivered over the course of implementation [ 27 ]. In turn, these data can be used to assess the costs, analyze their effectiveness [ 38 , 80 , 81 ], and ensure more accurate reporting [ 82 , 83 , 84 , 85 ]. This specification should include, among other data, the intensity, stage of implementation, and justification for the selection. Information regarding why strategies were selected for specific settings would further the field and be of great use to practitioners. [ 63 , 65 , 69 , 79 , 86 ].

Ensure that standards for measuring and reporting implementation outcomes are consistently applied and account for the complexity of implementation studies. Part of improving standardized reporting must include clearly defining outcomes and linking each outcome to particular implementation strategies. It was challenging in the present review to disentangle the impact of the intervention(s) (i.e., the EBP) versus the impact of the implementation strategy(ies) for each RE-AIM dimension. For example, often fidelity to the EBP was reported but not for the implementation strategies. Similarly, Reach and Adoption of the intervention would be reported for the Experimental Arm but not for the Control Arm, prohibiting statistical comparisons of strategies on the relative impact of the EBP between study arms. Moreover, there were many studies evaluating numerous outcomes, risking data dredging. Further, the significant heterogeneity in the ways in which implementation outcomes are operationalized and reported is a substantial barrier to conducting large-scale meta-analytic approaches to synthesizing evidence for implementation strategies [ 67 ]. The field could look to others in the social and health sciences for examples in how to test, validate, and promote a common set of outcome measures to aid in bringing consistency across studies and real-world practice (e.g., the NIH-funded Patient-Reported Outcomes Measurement Information System [PROMIS], https://www.healthmeasures.net/explore-measurement-systems/promis ).

Develop infrastructure to learn cross-study lessons in implementation science. Data repositories, like those developed by NCI for rare diseases, U.S. HIV Implementation Science Coordination Initiative [ 87 ], and the Behavior Change Technique Ontology [ 88 ], could allow implementation scientists to report their findings in a more standardized manner, which would promote ease of communication and contextualization of findings across studies. For example, the HIV Implementation Science Coordination Initiative requested all implementation projects use common frameworks, developed user friendly databases to enable practitioners to match strategies to determinants, and developed a dashboard of studies that assessed implementation determinants [ 89 , 90 , 91 , 92 , 93 , 94 ].

Develop and apply methods to rigorously study common strategies and bundles. These findings support prior recommendations for improved empirical rigor in implementation studies [ 46 , 95 ]. Many studies were excluded from our review based on not meeting methodological rigor standards. Understanding the effectiveness of discrete strategies deployed alone or in combination requires reliable and low burden tracking methods to collect information about strategy use and outcomes. For example, frameworks like the Implementation Replication Framework [ 96 ] could help interpret findings across studies using the same strategy bundle. Other tracking approaches may leverage technology (e.g., cell phones, tablets, EMR templates) [ 78 , 97 ] or find novel, pragmatic approaches to collect recommended strategy specifications over time (e.g.., dose, deliverer, and mechanism) [ 1 , 9 , 27 , 98 , 99 ]. Rigorous reporting standards could inform more robust analyses and conclusions (e.g., moving toward the goal of understanding causality, microcosting efforts) [ 24 , 38 , 100 , 101 ]. Such detailed tracking is also required to understand how site-level factors moderate implementation strategy effects [ 102 ]. In some cases, adaptive trial designs like sequential multiple assignment randomized trials (SMARTs) and just-in-time adaptive interventions (JITAIs) can be helpful for planning strategy escalation.

Limitations

Despite the strengths of this review, there were certain notable limitations. For one, we only included experimental studies, omitting many informative observational investigations that cover the range of implementation strategies. Second, our study period was centered on the creation of the journal Implementation Science and not on the standardization and operationalization of implementation strategies in the publication of the ERIC taxonomy (which came later). This, in conjunction with latency in reporting study results and funding cycles, means that the employed taxonomy was not applied in earlier studies. To address this limitation, we retroactively mapped strategies to ERIC, but it is possible that some studies were missed. Additionally, indexing approaches used by academic databases may have missed relevant studies. We addressed this particular concern by reviewing other systematic reviews of implementation strategies and soliciting recommendations from global implementation science experts.

Another potential limitation comes from the ERIC taxonomy itself—i.e., strategy listings like ERIC are only useful when they are widely adopted and used in conjunction with guidelines for specifying and reporting strategies [ 1 ] in protocol and outcome papers. Although the ERIC paper has been widely cited (over three thousand times, accessed about 186 thousand times), it is still not universally applied, making tracking the impact of specific strategies more difficult. However, our experience with this review seemed to suggest that ERIC’s use was increasing over time. Also, some have commented that ERIC strategies can be unclear and are missing key domains. Thus, researchers are making definitions clearer for lay users [ 37 , 103 ], increasing the number of discrete strategies for specific domains like HIV treatment, acknowledging strategies for new functions (e.g., de-implementation [ 104 ], local capacity building), accounting for phases of implementation (dissemination, sustainment [ 13 ], scale-up), addressing settings [ 12 , 20 ], actors roles in the process, and making mechanisms of change to select strategies more user-friendly through searchable databases [ 9 , 10 , 54 , 73 , 104 , 105 , 106 ]. In sum, we found the utility of the ERIC taxonomy to outweigh any of the taxonomy’s current limitations.

As with all reviews, the search terms influenced our findings. As such, the broad terms for implementation strategies (e.g., “evidence-based interventions”[ 7 ] or “behavior change techniques” [ 107 ]) may have led to inadvertent omissions of studies of specific strategies. For example, the search terms may not have captured tests of policies, financial strategies, community health promotion initiatives, or electronic medical record reminders, due to differences in terminology used in corresponding subfields of research (e.g., health economics, business, health information technology, and health policy). To manage this, we asked experts to inform us about any studies that they would include and cross-checked their lists with what was identified through our search terms, which yielded very few additional studies. We included standard coding using the ERIC taxonomy, which was a strength, but future work should consider including the additional strategies that have been recommended to augment ERIC, around sustainment [ 13 , 79 , 106 , 108 ], community and public health research [ 12 , 109 , 110 , 111 ], consumer or service user engagement [ 112 ], de-implementation [ 104 , 113 , 114 , 115 , 116 , 117 ] and related terms [ 118 ].

We were unable to assess the bias of studies due to non-standard reporting across the papers and the heterogeneity of study designs, measurement of implementation strategies and outcomes, and analytic approaches. This could have resulted in over- or underestimating the results of our synthesis. We addressed this limitation by being cautious in our reporting of findings, specifically in identifying “effective” implementation strategies. Further, we were not able to gather primary data to evaluate effect sizes across studies in order to systematically evaluate bias, which would be fruitful for future study.

Conclusions

This novel review of 129 studies summarized the body of evidence supporting the use of ERIC-defined implementation strategies to improve health or healthcare. We identified commonly occurring implementation strategies, frequently used bundles, and the strategies with the highest degree of supportive evidence, while simultaneously identifying gaps in the literature. Additionally, we identified several key areas for future growth and operationalization across the field of implementation science with the goal of improved reporting and assessment of implementation strategies and related outcomes.

Availability and materials

All data for this study are included in this published article and its supplementary information files.

We modestly revised the following research questions from our PROSPERO registration after reading the articles and better understanding the nature of the literature: 1) What is the available evidence regarding the effectiveness of implementation strategies in supporting the uptake and sustainment of evidence intended to improve health and healthcare outcomes? 2) What are the current gaps in the literature (i.e., implementation strategies that do not have sufficient evidence of effectiveness) that require further exploration?

Tested strategies are those which exist in the Experimental Arm but not in the Control Arm. Comparative effectiveness or time staggered trials may not have any unique strategies in the Experimental Arm and therefore in our analysis would have no Tested Strategies.

Abbreviations

Centers for Disease Control

Cumulated Index to Nursing and Allied Health Literature

Dissemination and Implementation

Evidence-based practices or programs

Expert Recommendations for Implementing Change

Multiphase Optimization Strategy

National Cancer Institute

National Institutes of Health

The Pittsburgh Dissemination and Implementation Science Collaborative

Sequential Multiple Assignment Randomized Trial

United States

Department of Veterans Affairs

Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recommendations for specifying and reporting. Implement Sci. 2013;8:139.

Article PubMed PubMed Central Google Scholar

Powell BJ, Waltz TJ, Chinman MJ, Damschroder LJ, Smith JL, Matthieu MM, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implement Sci. 2015;10:21.

Waltz TJ, Powell BJ, Chinman MJ, Smith JL, Matthieu MM, Proctor EK, et al. Expert recommendations for implementing change (ERIC): protocol for a mixed methods study. Implement Sci IS. 2014;9:39.

Article PubMed Google Scholar

Powell BJ, McMillen JC, Proctor EK, Carpenter CR, Griffey RT, Bunger AC, et al. A Compilation of Strategies for Implementing Clinical Innovations in Health and Mental Health. Med Care Res Rev. 2012;69:123–57.

Waltz TJ, Powell BJ, Matthieu MM, Damschroder LJ, Chinman MJ, Smith JL, et al. Use of concept mapping to characterize relationships among implementation strategies and assess their feasibility and importance: results from the Expert Recommendations for Implementing Change (ERIC) study. Implement Sci. 2015;10:109.

Perry CK, Damschroder LJ, Hemler JR, Woodson TT, Ono SS, Cohen DJ. Specifying and comparing implementation strategies across seven large implementation interventions: a practical application of theory. Implement Sci. 2019;14(1):32.

Community Preventive Services Task Force. Community Preventive Services Task Force: All Active Findings June 2023 [Internet]. 2023 [cited 2023 Aug 7]. Available from: https://www.thecommunityguide.org/media/pdf/CPSTF-All-Findings-508.pdf

Solberg LI, Kuzel A, Parchman ML, Shelley DR, Dickinson WP, Walunas TL, et al. A Taxonomy for External Support for Practice Transformation. J Am Board Fam Med JABFM. 2021;34:32–9.

Leeman J, Birken SA, Powell BJ, Rohweder C, Shea CM. Beyond “implementation strategies”: classifying the full range of strategies used in implementation science and practice. Implement Sci. 2017;12:1–9.

Article Google Scholar

Leeman J, Calancie L, Hartman MA, Escoffery CT, Herrmann AK, Tague LE, et al. What strategies are used to build practitioners’ capacity to implement community-based interventions and are they effective?: a systematic review. Implement Sci. 2015;10:1–15.

Nathan N, Shelton RC, Laur CV, Hailemariam M, Hall A. Editorial: Sustaining the implementation of evidence-based interventions in clinical and community settings. Front Health Serv. 2023;3:1176023.

Balis LE, Houghtaling B, Harden SM. Using implementation strategies in community settings: an introduction to the Expert Recommendations for Implementing Change (ERIC) compilation and future directions. Transl Behav Med. 2022;12:965–78.

Nathan N, Powell BJ, Shelton RC, Laur CV, Wolfenden L, Hailemariam M, et al. Do the Expert Recommendations for Implementing Change (ERIC) strategies adequately address sustainment? Front Health Serv. 2022;2:905909.

Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, et al. Audit and feedback effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2012;6:CD000259.

Google Scholar

Moore L, Guertin JR, Tardif P-A, Ivers NM, Hoch J, Conombo B, et al. Economic evaluations of audit and feedback interventions: a systematic review. BMJ Qual Saf. 2022;31:754–67.

Sykes MJ, McAnuff J, Kolehmainen N. When is audit and feedback effective in dementia care? A systematic review. Int J Nurs Stud. 2018;79:27–35.

Barnes C, McCrabb S, Stacey F, Nathan N, Yoong SL, Grady A, et al. Improving implementation of school-based healthy eating and physical activity policies, practices, and programs: a systematic review. Transl Behav Med. 2021;11:1365–410.

Tomasone JR, Kauffeldt KD, Chaudhary R, Brouwers MC. Effectiveness of guideline dissemination and implementation strategies on health care professionals’ behaviour and patient outcomes in the cancer care context: a systematic review. Implement Sci. 2020;15:1–18.

Seda V, Moles RJ, Carter SR, Schneider CR. Assessing the comparative effectiveness of implementation strategies for professional services to community pharmacy: A systematic review. Res Soc Adm Pharm. 2022;18:3469–83.

Lovero KL, Kemp CG, Wagenaar BH, Giusto A, Greene MC, Powell BJ, et al. Application of the Expert Recommendations for Implementing Change (ERIC) compilation of strategies to health intervention implementation in low- and middle-income countries: a systematic review. Implement Sci. 2023;18:56.

Chapman A, Rankin NM, Jongebloed H, Yoong SL, White V, Livingston PM, et al. Overcoming challenges in conducting systematic reviews in implementation science: a methods commentary. Syst Rev. 2023;12:1–6.

Article CAS Google Scholar

Proctor EK, Bunger AC, Lengnick-Hall R, Gerke DR, Martin JK, Phillips RJ, et al. Ten years of implementation outcomes research: a scoping review. Implement Sci. 2023;18:1–19.

Michaud TL, Pereira E, Porter G, Golden C, Hill J, Kim J, et al. Scoping review of costs of implementation strategies in community, public health and healthcare settings. BMJ Open. 2022;12:e060785.

Sohn H, Tucker A, Ferguson O, Gomes I, Dowdy D. Costing the implementation of public health interventions in resource-limited settings: a conceptual framework. Implement Sci. 2020;15:1–8.

Peek C, Glasgow RE, Stange KC, Klesges LM, Purcell EP, Kessler RS. The 5 R’s: an emerging bold standard for conducting relevant research in a changing world. Ann Fam Med. 2014;12:447–55.

Article CAS PubMed PubMed Central Google Scholar

Glasgow RE, Vogt TM, Boles SM. Evaluating the public health impact of health promotion interventions: the RE-AIM framework. Am J Public Health. 1999;89:1322–7.

Shelton RC, Chambers DA, Glasgow RE. An Extension of RE-AIM to Enhance Sustainability: Addressing Dynamic Context and Promoting Health Equity Over Time. Front Public Health. 2020;8:134.

Holtrop JS, Estabrooks PA, Gaglio B, Harden SM, Kessler RS, King DK, et al. Understanding and applying the RE-AIM framework: Clarifications and resources. J Clin Transl Sci. 2021;5:e126.

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4:1.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;349:g7647.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ [Internet]. 2021;372. Available from: https://www.bmj.com/content/372/bmj.n71

Rabin BA, Brownson RC, Haire-Joshu D, Kreuter MW, Weaver NL. A Glossary for Dissemination and Implementation Research in Health. J Public Health Manag Pract. 2008;14:117–23.

Eccles MP, Mittman BS. Welcome to Implementation Science. Implement Sci. 2006;1:1.

Article PubMed Central Google Scholar

Miller WR, Wilbourne PL. Mesa Grande: a methodological analysis of clinical trials of treatments for alcohol use disorders. Addict Abingdon Engl. 2002;97:265–77.

Miller WR, Brown JM, Simpson TL, Handmaker NS, Bien TH, Luckie LF, et al. What works? A methodological analysis of the alcohol treatment outcome literature. Handb Alcohol Treat Approaches Eff Altern 2nd Ed. Needham Heights, MA, US: Allyn & Bacon; 1995:12–44.

Wells S, Tamir O, Gray J, Naidoo D, Bekhit M, Goldmann D. Are quality improvement collaboratives effective? A systematic review BMJ Qual Saf. 2018;27:226–40.

Yakovchenko V, Chinman MJ, Lamorte C, Powell BJ, Waltz TJ, Merante M, et al. Refining Expert Recommendations for Implementing Change (ERIC) strategy surveys using cognitive interviews with frontline providers. Implement Sci Commun. 2023;4:1–14.

Wagner TH, Yoon J, Jacobs JC, So A, Kilbourne AM, Yu W, et al. Estimating costs of an implementation intervention. Med Decis Making. 2020;40:959–67.

Gold HT, McDermott C, Hoomans T, Wagner TH. Cost data in implementation science: categories and approaches to costing. Implement Sci. 2022;17:11.

Boutron I, Page MJ, Higgins JP, Altman DG, Lundh A, Hróbjartsson A. Considering bias and conflicts of interest among the included studies. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA, editors. Cochrane Handbook for Systematic Reviews of Interventions. 2019. https://doi.org/10.1002/9781119536604.ch7 .

Higgins JP, Savović J, Page MJ, Elbers RG, Sterne J. Assessing risk of bias in a randomized trial. Cochrane Handb Syst Rev Interv. 2019;6:205–28.

Reilly KL, Kennedy S, Porter G, Estabrooks P. Comparing, Contrasting, and Integrating Dissemination and Implementation Outcomes Included in the RE-AIM and Implementation Outcomes Frameworks. Front Public Health [Internet]. 2020 [cited 2024 Apr 24];8. Available from: https://www.frontiersin.org/journals/public-health/articles/ https://doi.org/10.3389/fpubh.2020.00430/full

Grimshaw JM, Thomas RE, MacLennan G, Fraser C, Ramsay CR, Vale L, et al. Effectiveness and efficiency of guideline dissemination and implementation strategies. Health Technol Assess Winch Engl. 2004;8:iii–iv 1-72.

CAS Google Scholar

Beidas RS, Kendall PC. Training Therapists in Evidence-Based Practice: A Critical Review of Studies From a Systems-Contextual Perspective. Clin Psychol Publ Div Clin Psychol Am Psychol Assoc. 2010;17:1–30.

Powell BJ, Beidas RS, Lewis CC, Aarons GA, McMillen JC, Proctor EK, et al. Methods to Improve the Selection and Tailoring of Implementation Strategies. J Behav Health Serv Res. 2017;44:177–94.

Powell BJ, Fernandez ME, Williams NJ, Aarons GA, Beidas RS, Lewis CC, et al. Enhancing the Impact of Implementation Strategies in Healthcare: A Research Agenda. Front Public Health [Internet]. 2019 [cited 2021 Mar 31];7. Available from: https://www.frontiersin.org/articles/ https://doi.org/10.3389/fpubh.2019.00003/full

Frakt AB, Prentice JC, Pizer SD, Elwy AR, Garrido MM, Kilbourne AM, et al. Overcoming Challenges to Evidence-Based Policy Development in a Large. Integrated Delivery System Health Serv Res. 2018;53:4789–807.

PubMed Google Scholar

Crable EL, Lengnick-Hall R, Stadnick NA, Moullin JC, Aarons GA. Where is “policy” in dissemination and implementation science? Recommendations to advance theories, models, and frameworks: EPIS as a case example. Implement Sci. 2022;17:80.

Crable EL, Grogan CM, Purtle J, Roesch SC, Aarons GA. Tailoring dissemination strategies to increase evidence-informed policymaking for opioid use disorder treatment: study protocol. Implement Sci Commun. 2023;4:16.

Bond GR. Evidence-based policy strategies: A typology. Clin Psychol Sci Pract. 2018;25:e12267.

Loo TS, Davis RB, Lipsitz LA, Irish J, Bates CK, Agarwal K, et al. Electronic Medical Record Reminders and Panel Management to Improve Primary Care of Elderly Patients. Arch Intern Med. 2011;171:1552–8.

Shojania KG, Jennings A, Mayhew A, Ramsay C, Eccles M, Grimshaw J. Effect of point-of-care computer reminders on physician behaviour: a systematic review. CMAJ Can Med Assoc J. 2010;182:E216-25.

Sequist TD, Gandhi TK, Karson AS, Fiskio JM, Bugbee D, Sperling M, et al. A Randomized Trial of Electronic Clinical Reminders to Improve Quality of Care for Diabetes and Coronary Artery Disease. J Am Med Inform Assoc JAMIA. 2005;12:431–7.

Dopp AR, Kerns SEU, Panattoni L, Ringel JS, Eisenberg D, Powell BJ, et al. Translating economic evaluations into financing strategies for implementing evidence-based practices. Implement Sci. 2021;16:1–12.

Dopp AR, Hunter SB, Godley MD, Pham C, Han B, Smart R, et al. Comparing two federal financing strategies on penetration and sustainment of the adolescent community reinforcement approach for substance use disorders: protocol for a mixed-method study. Implement Sci Commun. 2022;3:51.

Proctor EK, Toker E, Tabak R, McKay VR, Hooley C, Evanoff B. Market viability: a neglected concept in implementation science. Implement Sci. 2021;16:98.

Dopp AR, Narcisse M-R, Mundey P, Silovsky JF, Smith AB, Mandell D, et al. A scoping review of strategies for financing the implementation of evidence-based practices in behavioral health systems: State of the literature and future directions. Implement Res Pract. 2020;1:2633489520939980.

PubMed PubMed Central Google Scholar

Kilbourne AM, Neumann MS, Pincus HA, Bauer MS, Stall R. Implementing evidence-based interventions in health care:application of the replicating effective programs framework. Implement Sci. 2007;2:42–51.

Kegeles SM, Rebchook GM, Hays RB, Terry MA, O’Donnell L, Leonard NR, et al. From science to application: the development of an intervention package. AIDS Educ Prev Off Publ Int Soc AIDS Educ. 2000;12:62–74.

Wandersman A, Imm P, Chinman M, Kaftarian S. Getting to outcomes: a results-based approach to accountability. Eval Program Plann. 2000;23:389–95.

Wandersman A, Chien VH, Katz J. Toward an evidence-based system for innovation support for implementing innovations with quality: Tools, training, technical assistance, and quality assurance/quality improvement. Am J Community Psychol. 2012;50:445–59.

Rogal SS, Yakovchenko V, Waltz TJ, Powell BJ, Kirchner JE, Proctor EK, et al. The association between implementation strategy use and the uptake of hepatitis C treatment in a national sample. Implement Sci. 2017;12:1–13.

Smith SN, Almirall D, Prenovost K, Liebrecht C, Kyle J, Eisenberg D, et al. Change in patient outcomes after augmenting a low-level implementation strategy in community practices that are slow to adopt a collaborative chronic care model: a cluster randomized implementation trial. Med Care. 2019;57:503.

Rogal SS, Yakovchenko V, Waltz TJ, Powell BJ, Gonzalez R, Park A, et al. Longitudinal assessment of the association between implementation strategy use and the uptake of hepatitis C treatment: Year 2. Implement Sci. 2019;14:1–12.

Harvey G, Kitson A. Translating evidence into healthcare policy and practice: Single versus multi-faceted implementation strategies – is there a simple answer to a complex question? Int J Health Policy Manag. 2015;4:123–6.

Engell T, Stadnick NA, Aarons GA, Barnett ML. Common Elements Approaches to Implementation Research and Practice: Methods and Integration with Intervention Science. Glob Implement Res Appl. 2023;3:1–15.

Michie S, Fixsen D, Grimshaw JM, Eccles MP. Specifying and reporting complex behaviour change interventions: the need for a scientific method. Implement Sci IS. 2009;4:40.

Smith JD, Li DH, Rafferty MR. The Implementation Research Logic Model: a method for planning, executing, reporting, and synthesizing implementation projects. Implement Sci IS. 2020;15:84.

Perez Jolles M, Lengnick-Hall R, Mittman BS. Core Functions and Forms of Complex Health Interventions: a Patient-Centered Medical Home Illustration. JGIM J Gen Intern Med. 2019;34:1032–8.

Schroeck FR, Ould Ismail AA, Haggstrom DA, Sanchez SL, Walker DR, Zubkoff L. Data-driven approach to implementation mapping for the selection of implementation strategies: a case example for risk-aligned bladder cancer surveillance. Implement Sci IS. 2022;17:58.

Frank HE, Kemp J, Benito KG, Freeman JB. Precision Implementation: An Approach to Mechanism Testing in Implementation Research. Adm Policy Ment Health. 2022;49:1084–94.

Lewis CC, Klasnja P, Lyon AR, Powell BJ, Lengnick-Hall R, Buchanan G, et al. The mechanics of implementation strategies and measures: advancing the study of implementation mechanisms. Implement Sci Commun. 2022;3:114.

Geng EH, Baumann AA, Powell BJ. Mechanism mapping to advance research on implementation strategies. PLoS Med. 2022;19:e1003918.

Pinnock H, Barwick M, Carpenter CR, Eldridge S, Grandes G, Griffiths CJ, et al. Standards for Reporting Implementation Studies (StaRI) Statement. BMJ. 2017;356:i6795.

Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, et al. Outcomes for Implementation Research: Conceptual Distinctions, Measurement Challenges, and Research Agenda. Adm Policy Ment Health Ment Health Serv Res. 2011;38:65–76.

Hooley C, Amano T, Markovitz L, Yaeger L, Proctor E. Assessing implementation strategy reporting in the mental health literature: a narrative review. Adm Policy Ment Health Ment Health Serv Res. 2020;47:19–35.

Proctor E, Ramsey AT, Saldana L, Maddox TM, Chambers DA, Brownson RC. FAST: a framework to assess speed of translation of health innovations to practice and policy. Glob Implement Res Appl. 2022;2:107–19.

Cullen L, Hanrahan K, Edmonds SW, Reisinger HS, Wagner M. Iowa Implementation for Sustainability Framework. Implement Sci IS. 2022;17:1.

Saldana L, Ritzwoller DP, Campbell M, Block EP. Using economic evaluations in implementation science to increase transparency in costs and outcomes for organizational decision-makers. Implement Sci Commun. 2022;3:40.

Eisman AB, Kilbourne AM, Dopp AR, Saldana L, Eisenberg D. Economic evaluation in implementation science: making the business case for implementation strategies. Psychiatry Res. 2020;283:112433.

Akiba CF, Powell BJ, Pence BW, Nguyen MX, Golin C, Go V. The case for prioritizing implementation strategy fidelity measurement: benefits and challenges. Transl Behav Med. 2022;12:335–42.

Akiba CF, Powell BJ, Pence BW, Muessig K, Golin CE, Go V. “We start where we are”: a qualitative study of barriers and pragmatic solutions to the assessment and reporting of implementation strategy fidelity. Implement Sci Commun. 2022;3:117.

Rudd BN, Davis M, Doupnik S, Ordorica C, Marcus SC, Beidas RS. Implementation strategies used and reported in brief suicide prevention intervention studies. JAMA Psychiatry. 2022;79:829–31.

Painter JT, Raciborski RA, Matthieu MM, Oliver CM, Adkins DA, Garner KK. Engaging stakeholders to retrospectively discern implementation strategies to support program evaluation: Proposed method and case study. Eval Program Plann. 2024;103:102398.

Bunger AC, Powell BJ, Robertson HA, MacDowell H, Birken SA, Shea C. Tracking implementation strategies: a description of a practical approach and early findings. Health Res Policy Syst. 2017;15:1–12.

Mustanski B, Smith JD, Keiser B, Li DH, Benbow N. Supporting the growth of domestic HIV implementation research in the united states through coordination, consultation, and collaboration: how we got here and where we are headed. JAIDS J Acquir Immune Defic Syndr. 2022;90:S1-8.

Marques MM, Wright AJ, Corker E, Johnston M, West R, Hastings J, et al. The Behaviour Change Technique Ontology: Transforming the Behaviour Change Technique Taxonomy v1. Wellcome Open Res. 2023;8:308.

Merle JL, Li D, Keiser B, Zamantakis A, Queiroz A, Gallo CG, et al. Categorising implementation determinants and strategies within the US HIV implementation literature: a systematic review protocol. BMJ Open. 2023;13:e070216.

Glenshaw MT, Gaist P, Wilson A, Cregg RC, Holtz TH, Goodenow MM. Role of NIH in the Ending the HIV Epidemic in the US Initiative: Research Improving Practice. J Acquir Immune Defic Syndr. 1999;2022(90):S9-16.

Purcell DW, Namkung Lee A, Dempsey A, Gordon C. Enhanced Federal Collaborations in Implementation Science and Research of HIV Prevention and Treatment. J Acquir Immune Defic Syndr. 1999;2022(90):S17-22.

Queiroz A, Mongrella M, Keiser B, Li DH, Benbow N, Mustanski B. Profile of the Portfolio of NIH-Funded HIV Implementation Research Projects to Inform Ending the HIV Epidemic Strategies. J Acquir Immune Defic Syndr. 1999;2022(90):S23-31.

Zamantakis A, Li DH, Benbow N, Smith JD, Mustanski B. Determinants of Pre-exposure Prophylaxis (PrEP) Implementation in Transgender Populations: A Qualitative Scoping Review. AIDS Behav. 2023;27:1600–18.

Li DH, Benbow N, Keiser B, Mongrella M, Ortiz K, Villamar J, et al. Determinants of Implementation for HIV Pre-exposure Prophylaxis Based on an Updated Consolidated Framework for Implementation Research: A Systematic Review. J Acquir Immune Defic Syndr. 1999;2022(90):S235-46.

Chambers DA, Emmons KM. Navigating the field of implementation science towards maturity: challenges and opportunities. Implement Sci. 2024;19:26, s13012-024-01352–0.

Chinman M, Acosta J, Ebener P, Shearer A. “What we have here, is a failure to [replicate]”: Ways to solve a replication crisis in implementation science. Prev Sci. 2022;23:739–50.

Chambers DA, Glasgow RE, Stange KC. The dynamic sustainability framework: addressing the paradox of sustainment amid ongoing change. Implement Sci. 2013;8:117.

Lengnick-Hall R, Gerke DR, Proctor EK, Bunger AC, Phillips RJ, Martin JK, et al. Six practical recommendations for improved implementation outcomes reporting. Implement Sci. 2022;17:16.

Miller CJ, Barnett ML, Baumann AA, Gutner CA, Wiltsey-Stirman S. The FRAME-IS: a framework for documenting modifications to implementation strategies in healthcare. Implement Sci IS. 2021;16:36.

Xu X, Lazar CM, Ruger JP. Micro-costing in health and medicine: a critical appraisal. Health Econ Rev. 2021;11:1.

Barnett ML, Dopp AR, Klein C, Ettner SL, Powell BJ, Saldana L. Collaborating with health economists to advance implementation science: a qualitative study. Implement Sci Commun. 2020;1:82.

Lengnick-Hall R, Williams NJ, Ehrhart MG, Willging CE, Bunger AC, Beidas RS, et al. Eight characteristics of rigorous multilevel implementation research: a step-by-step guide. Implement Sci. 2023;18:52.

Riley-Gibson E, Hall A, Shoesmith A, Wolfenden L, Shelton RC, Doherty E, et al. A systematic review to determine the effect of strategies to sustain chronic disease prevention interventions in clinical and community settings: study protocol. Res Sq [Internet]. 2023 [cited 2024 Apr 19]; Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312971/

Ingvarsson S, Hasson H, von Thiele Schwarz U, Nilsen P, Powell BJ, Lindberg C, et al. Strategies for de-implementation of low-value care—a scoping review. Implement Sci IS. 2022;17:73.

Lewis CC, Powell BJ, Brewer SK, Nguyen AM, Schriger SH, Vejnoska SF, et al. Advancing mechanisms of implementation to accelerate sustainable evidence-based practice integration: protocol for generating a research agenda. BMJ Open. 2021;11:e053474.

Hailemariam M, Bustos T, Montgomery B, Barajas R, Evans LB, Drahota A. Evidence-based intervention sustainability strategies: a systematic review. Implement Sci. 2019;14:N.PAG-N.PAG.

Michie S, Atkins L, West R. The behaviour change wheel. Guide Des Interv 1st Ed G B Silverback Publ. 2014;1003:1010.

Birken SA, Haines ER, Hwang S, Chambers DA, Bunger AC, Nilsen P. Advancing understanding and identifying strategies for sustaining evidence-based practices: a review of reviews. Implement Sci IS. 2020;15:88.

Metz A, Jensen T, Farley A, Boaz A, Bartley L, Villodas M. Building trusting relationships to support implementation: A proposed theoretical model. Front Health Serv. 2022;2:894599.

Rabin BA, Cain KL, Watson P, Oswald W, Laurent LC, Meadows AR, et al. Scaling and sustaining COVID-19 vaccination through meaningful community engagement and care coordination for underserved communities: hybrid type 3 effectiveness-implementation sequential multiple assignment randomized trial. Implement Sci IS. 2023;18:28.

Gyamfi J, Iwelunmor J, Patel S, Irazola V, Aifah A, Rakhra A, et al. Implementation outcomes and strategies for delivering evidence-based hypertension interventions in lower-middle-income countries: Evidence from a multi-country consortium for hypertension control. PLOS ONE. 2023;18:e0286204.

Woodward EN, Ball IA, Willging C, Singh RS, Scanlon C, Cluck D, et al. Increasing consumer engagement: tools to engage service users in quality improvement or implementation efforts. Front Health Serv. 2023;3:1124290.

Norton WE, Chambers DA. Unpacking the complexities of de-implementing inappropriate health interventions. Implement Sci IS. 2020;15:2.

Norton WE, McCaskill-Stevens W, Chambers DA, Stella PJ, Brawley OW, Kramer BS. DeImplementing Ineffective and Low-Value Clinical Practices: Research and Practice Opportunities in Community Oncology Settings. JNCI Cancer Spectr. 2021;5:pkab020.

McKay VR, Proctor EK, Morshed AB, Brownson RC, Prusaczyk B. Letting Go: Conceptualizing Intervention De-implementation in Public Health and Social Service Settings. Am J Community Psychol. 2018;62:189–202.

Patey AM, Grimshaw JM, Francis JJ. Changing behaviour, ‘more or less’: do implementation and de-implementation interventions include different behaviour change techniques? Implement Sci IS. 2021;16:20.

Rodriguez Weno E, Allen P, Mazzucca S, Farah Saliba L, Padek M, Moreland-Russell S, et al. Approaches for Ending Ineffective Programs: Strategies From State Public Health Practitioners. Front Public Health. 2021;9:727005.

Gnjidic D, Elshaug AG. De-adoption and its 43 related terms: harmonizing low-value care terminology. BMC Med. 2015;13:273.

Download references

Acknowledgements

The authors would like to acknowledge the early contributions of the Pittsburgh Dissemination and Implementation Science Collaborative (Pitt DISC). LEA would like to thank Dr. Billie Davis for analytical support. The authors would like to acknowledge the implementation science experts who recommended articles for our review, including Greg Aarons, Mark Bauer, Rinad Beidas, Geoffrey Curran, Laura Damschroder, Rani Elwy, Amy Kilbourne, JoAnn Kirchner, Jennifer Leeman, Cara Lewis, Dennis Li, Aaron Lyon, Gila Neta, and Borsika Rabin.

Dr. Rogal’s time was funded in part by a University of Pittsburgh K award (K23-DA048182) and by a VA Health Services Research and Development grant (PEC 19-207). Drs. Bachrach and Quinn were supported by VA HSR Career Development Awards (CDA 20-057, PI: Bachrach; CDA 20-224, PI: Quinn). Dr. Scheunemann’s time was funded by the US Agency for Healthcare Research and Quality (K08HS027210). Drs. Hero, Chinman, Goodrich, Ernecoff, and Mr. Qureshi were funded by the Patient-Centered Outcomes Research Institute (PCORI) AOSEPP2 Task Order 12 to conduct a landscape review of US studies on the effectiveness of implementation strategies with results reported here ( https://www.pcori.org/sites/default/files/PCORI-Implementation-Strategies-for-Evidence-Based-Practice-in-Health-and-Health-Care-A-Review-of-the-Evidence-Full-Report.pdf and https://www.pcori.org/sites/default/files/PCORI-Implementation-Strategies-for-Evidence-Based-Practice-in-Health-and-Health-Care-Brief-Report-Summary.pdf ). Dr. Ashcraft and Ms. Phares were funded by the Center for Health Equity Research and Promotion, (CIN 13-405). The funders had no involvement in this study.

Author information

Shari S. Rogal and Matthew J. Chinman are co-senior authors.

Authors and Affiliations

Center for Health Equity Research and Promotion, Corporal Michael Crescenz VA Medical Center, Philadelphia, PA, USA

Laura Ellen Ashcraft

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA

Center for Health Equity Research and Promotion, VA Pittsburgh Healthcare System, Pittsburgh, PA, USA

David E. Goodrich, Angela Phares, Deirdre A. Quinn, Shari S. Rogal & Matthew J. Chinman

Division of General Internal Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA

David E. Goodrich, Deirdre A. Quinn & Matthew J. Chinman

Clinical & Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA

David E. Goodrich & Lisa G. Lederer

RAND Corporation, Pittsburgh, PA, USA

Joachim Hero, Nabeel Qureshi, Natalie C. Ernecoff & Matthew J. Chinman

Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, USA

Rachel L. Bachrach

Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI, USA

Division of Geriatric Medicine, University of Pittsburgh, Department of Medicine, Pittsburgh, PA, USA

Leslie Page Scheunemann

Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, University of Pittsburgh, Department of Medicine, Pittsburgh, PA, USA

Departments of Medicine and Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Shari S. Rogal

You can also search for this author in PubMed Google Scholar

Contributions

LEA, SSR, and MJC conceptualized the study. LEA, SSR, MJC, and JOH developed the study design. LEA and JOH acquired the data. LEA, DEG, AP, RLB, DAQ, LGL, LPS, SSR, NQ, and MJC conducted the abstract, full text review, and rigor assessment. LEA, DEG, JOH, AP, RLB, DAQ, NQ, NCE, SSR, and MJC conducted the data abstraction. DEG, SSR, and MJC adjudicated conflicts. LEA and SSR analyzed the data. LEA, SSR, JOH, and MJC interpreted the data. LEA, SSR, and MJC drafted the work. All authors substantially revised the work. All authors approved the submitted version and agreed to be personally accountable for their contributions and the integrity of the work.

Corresponding author

Correspondence to Laura Ellen Ashcraft .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

The manuscript does not contain any individual person’s data.

Competing interests

Additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., supplementary material 4., supplementary material 5., supplementary material 6., supplementary material 7., supplementary material 8., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ashcraft, L.E., Goodrich, D.E., Hero, J. et al. A systematic review of experimentally tested implementation strategies across health and human service settings: evidence from 2010-2022. Implementation Sci 19 , 43 (2024). https://doi.org/10.1186/s13012-024-01369-5

Download citation

Received : 09 November 2023

Accepted : 27 May 2024

Published : 24 June 2024

DOI : https://doi.org/10.1186/s13012-024-01369-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Implementation strategy
Health-related outcomes

Implementation Science

ISSN: 1748-5908

Submission enquiries: Access here and click Contact Us
General enquiries: [email protected]

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
PMC10248995

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses


Cochrane (formerly Cochrane Collaboration)
JBI (formerly Joanna Briggs Institute)

National Institute for Health and Care Excellence (NICE)—United Kingdom
Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review type	Topic assessed	Elements of research question (mnemonic)
Intervention [ , ]	Benefits and harms of interventions used in healthcare.	opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]	How well a diagnostic test performs in diagnosing and detecting a particular disease.	opulation, ndex test(s), and arget condition ( )
Qualitative
Cochrane [ ]	Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.	etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( ) ample, henomenon of nterest, esign, valuation, esearch type ( ) spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )
JBI [ ]	Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities.	opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]	Probable course or future outcome(s) of people with a health problem.	opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]	The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome.	opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]	What is the most suitable instrument to measure a construct of interest in a specific study population?	opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]	The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?	Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI



Intervention	8572	96.3	Effectiveness	435	61.5
Diagnostic	176	1.9	Diagnostic Test Accuracy	9	1.3
Overview	64	0.7	Umbrella	4	0.6
Methodology	41	0.45	Mixed Methods	2	0.3
Qualitative	17	0.19	Qualitative	159	22.5
Prognostic	11	0.12	Prevalence and Incidence	6	0.8
Rapid	11	0.12	Etiology and Risk	7	1.0
Prototype	8	0.08	Measurement Properties	3	0.4
			Economic	6	0.6
			Text and Opinion	1	0.14
			Scoping	43	6.0
			Comprehensive	32	4.5
	Total = 8900			Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1 provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis


Quality of Reporting of Meta-analyses (QUOROM) Statement	Moher 1999 [ ]
Meta-analyses Of Observational Studies in Epidemiology (MOOSE)	Stroup 2000 [ ]
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)	Moher 2009 [ ]
PRISMA 2020	Page 2021 [ ]

Overview Quality Assessment Questionnaire (OQAQ)	Oxman and Guyatt 1991 [ ]
Systematic Review Critical Appraisal Sheet	Centre for Evidence-based Medicine 2005 [ ]
A Measurement Tool to Assess Systematic Reviews (AMSTAR)	Shea 2007 [ ]
AMSTAR-2	Shea 2017 [ ]

Risk of Bias in Systematic Reviews (ROBIS)	Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1 but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic

	Extensive	Extensive
	Intervention	Intervention, diagnostic, etiology, prognostic
	7 critical, 9 non-critical	4

Total number	16	29
Response options	Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or Items # 2, 4, 7, 8, 9 : rated or Items # 11 , 12, 15: rated or	24 assessment items: rated 5 items regarding level of concern: rated

Construct	Confidence based on weaknesses in critical domains	Level of concern for risk of bias
Categories	High, moderate, low, critically low	Low, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions


PRISMA for systematic reviews with a focus on health equity [ ]	PRISMA-E	2012
Reporting systematic reviews in journal and conference abstracts [ ]	PRISMA for Abstracts	2015; 2020
PRISMA for systematic review protocols [ ]	PRISMA-P	2015
PRISMA for Network Meta-Analyses [ ]	PRISMA-NMA	2015
PRISMA for Individual Participant Data [ ]	PRISMA-IPD	2015
PRISMA for reviews including harms outcomes [ ]	PRISMA-Harms	2016
PRISMA for diagnostic test accuracy [ ]	PRISMA-DTA	2018
PRISMA for scoping reviews [ ]	PRISMA-ScR	2018
PRISMA for acupuncture [ ]	PRISMA-A	2019
PRISMA for reporting literature searches [ ]	PRISMA-S	2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1 links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a



			Table	Table



Methods for study selection	#5	#2.5	All three components must be done in duplicate, and methods fully described.	Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction	#6	#3.1
Methods for RoB assessment	NA	#3.5

Study description	#8	#3.2	Research design features, components of research question (eg, PICO), setting, funding sources.	Allows readers to understand the individual studies in detail.


Sources of funding	#10	NA	Identified for all included studies.	Can reveal CoI or bias.

Publication bias	#15*	#4.5	Explored, diagrammed, and discussed.	Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI	#16	NA	Disclosed, with management strategies described.	If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2 lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

Acronym	Meaning
	feasible, interesting, novel, ethical, and relevant
	specific, measurable, attainable, relevant, timely
	time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses


BMJ Open
BioMed Central
JMIR Research Protocols
World Journal of Meta-analysis

Cochrane
JBI
PROSPERO
Research Registry- Registry of Systematic Reviews/Meta-Analyses
International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)

Center for Open Science
Protocols.io

Figshare
Open Science Framework
Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis



Aggregate data Individual participant data	Weighted average of effect estimates	Pairwise comparisons of effect estimates, CI Overall effect estimate, CI, value Evaluation of heterogeneity	Forest plot with summary statistic for average effect estimate
Network	Variable	The interventions, which are compared directly indirectly	Network diagram or graph, tabular presentations
		Comparisons of relative effects between any pair of interventions	Effect estimates for intervention pairings
		Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneity	Forest plot, other methods
		Treatment rankings (ie, probability that an intervention is among the best options)	Rankogram plot
	Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)	Range and distribution of observed effects such as median, interquartile range, range	Box-and-whisker plot, bubble plot Forest plot (without summary effect estimate)
	Combining values	Combined value, number of studies	Albatross plot (study sample size against values per outcome)
	Vote counting by direction of effect (eg, favors intervention over the comparator)	Proportion of studies with an effect in the direction of interest, CI, value	Harvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4 for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4 for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

	[ ]
Risk of bias [ ]	Large magnitude of effect
Imprecision [ ]	Dose–response gradient
Inconsistency [ ]	All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

⊕ ⊕ ⊕ ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect

⊕ ⊕ ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different

⊕ ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect

⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.

2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).

3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.

4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a


	Cochrane , JBI	Cochrane, JBI	Cochrane	Cochrane, JBI	JBI	JBI	JBI	Cochrane, JBI	JBI

Protocol	PRISMA-P [ ]	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P	PRISMA-P
Systematic review	PRISMA 2020 [ ]	PRISMA-DTA [ ]	PRISMA 2020	eMERGe [ ] ENTREQ [ ]	PRISMA 2020	PRISMA 2020	PRISMA 2020	PRIOR [ ]	PRISMA-ScR [ ]
Synthesis without MA	SWiM [ ]	PRISMA-DTA [ ]	SWiM	eMERGe [ ] ENTREQ [ ]	SWiM	SWiM	SWiM	PRIOR [ ]	PRISMA-ScR [ ]
	For RCTs: Cochrane RoB2 [ ] For NRSI: ROBINS-I [ ] Other primary research	QUADAS-2[ ]	Factor review QUIPS [ ] Model review PROBAST [ ]	CASP qualitative checklist [ ] JBI Critical Appraisal Checklist [ ]	JBI checklist for studies reporting prevalence data [ ]	For NRSI: ROBINS-I [ ] Other primary research	COSMIN RoB Checklist [ ]	AMSTAR-2 [ ] or ROBIS [ ]	Not required
	GRADE [ ]	GRADE adaptation	GRADE adaptation	CERQual [ ] ConQual [ ]	GRADE adaptation	Risk factors	GRADE adaptation	GRADE (for intervention reviews) Risk factors	Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.

The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.

A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.

An event or measurement collected for participants in a study (such as quality of life, mortality).

The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.

A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.

The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.

An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

Preferred	Potentially problematic
Evidence synthesis with meta-analysis Systematic review with meta-analysis	Meta-analysis
Overview or umbrella review	Systematic review of systematic reviews Review of reviews Meta-review
Randomized	Experimental
Non-randomized	Observational
Single case experimental design	Single-subject research N-of-1 design
Case report or case series	Descriptive study
Methodological quality	Quality
Certainty of evidence	Quality of evidence Grade of evidence Level of evidence Strength of evidence
Qualitative systematic review	Qualitative synthesis
Synthesis of qualitative data	Qualitative synthesis
Synthesis without meta-analysis	Narrative synthesis , narrative summary Qualitative synthesis Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Article Menu

Subscribe SciFeed
Google Scholar
on Google Scholar
Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Is there a relationship between salivary cortisol and temporomandibular disorder: a systematic review, 1. introduction, 2. materials and methods, 2.1. protocol and registration, 2.2. eligibility criteria, 2.3. information sources and search strategies, 2.4. study selection, 2.5. data extraction and data items, 2.6. data synthesis (meta-analysis), 2.7. risk of bias in individual studies, 3.1. characteristics of included studies ( table 2 ), 3.2. salivary parameters of the participants of the included studies ( table 3 ), 3.3. risk of bias assessment ( table 4 , table 5 and table 6 ), 3.4. certainty of evidence (grade analysis) ( table 7 ).

Author, Year, Region	Study Design	Age Range/ Average (yrs)	Sample Size (Test/Control)	Study Population	Key Findings	Conclusion
Rosar et al., 2021, Brazil [ ]	Cross-sectional	19–30	43 (28/15)	TMD group Healthy group	Similar salivary cortisol levels found between groups on awakening and after 30 min	Cortisol levels were not associated with the number or duration of bruxism (TMD) episodes
Venkatesh et al., 2021, India [ ]	Cross-sectional	18–23	44 (22/22)	Test with TMD Controls without TMD	Salivary cortisol levels showed statistically significant difference between the TMD and control groups	Salivary cortisol can be used as a biological marker of stress in TMD
Goyal et al., 2020, India [ ]	RCT	24.05 ± 2.3	60 (20/20/20)	TMDs and positive depression levels TMDs and no depression Healthy control	Statistically significant higher value of salivary cortisol in TMD with depression, as compared to TMD without depression and control	Salivary cortisol could be a promising tool in identifying underlying psychological factors associated with TMDs
D’Avilla, 2019, Brazil [ ]	Cross sectional	25.3 ± 5.1	60 (45/15)	Group I: No TMD and clinically normal occlusion Group II: With TMD and malocclusion Group III: TMD and clinically normal occlusion Group IV: No TMD and with malocclusion	Salivary cortisol level was significantly higher in individuals with TMD (G2 and G3), independent of the presence/absence of malocclusion	Quality of life, pain, and emotional stress are associated with and impaired by the TMD condition, regardless of malocclusion presence
Bozovic et al., 2018, Bosnia and Herzegovina [ ]	Case–control	19.35	60 (30/30)	TMD group Healthy controls	Levels of salivary cortisol were found to be significantly higher in the study group compared to the control group	Salivary cortisol plays a vital role in TMD development
Chinthakanan et al., 2018, Thailand [ ]	Case–control	24	44 (21/23)	TMD group Control group	The salivary cortisol level of the TMD group was significantly greater than that of the control group	Patients with TMD demonstrated autonomic nervous system (ANS) imbalance and increased stress levels
Magri et al., 2018, Brazil [ ]	RCT	18–40	64 (41/23)	Laser group (TMD) Placebo group Without treatment group	Women with lower cortisol levels (below 10 ng/mL) were more responsive to active and placebo laser treatment than women with higher cortisol levels (above 10 ng/mL)	Most responsive cluster to active and placebo LLLT was women with low levels of anxiety, salivary levels below 10 ng/mL
Rosar et al., 2017, Brazil [ ]	RCT	19–30	43 (28/15)	Sleep bruxism group (Gsb) Control group (Gc)	Salivary cortisol showed a significant decrease between baseline and T1 in test, which was not observed in control	Short-term treatment with interocclusal splints had positive affect on salivary cortisol levels in subjects with sleep bruxism
Poorian et al., 2016, Iran [ ]	Case–control	19–40	41 (15/26)	TMD patients Healthy people	Salivary cortisol levels in TMD patients are significantly higher than in healthy people	Increase in salivary cortisol levels increases the probability of suffering from TMD
Tosato et al., 2015, Brazil [ ]	Cross-sectional	18–40	49 (26/25)	Women with TMD Healthy women	Moderate to strong correlations were found between salivary cortisol and EMG activities of the women with severe TMD	Increase in cortisol levels corresponded with greater muscle activity and TMD severity
Almeida et al., 2014, Brazil [ ]	Case–control	19–32	48 (25/23)	With TMD Without TMD	Results show no difference between groups	No relationship between saliva cortisol, TMD, and depression
Nilsson and Dahlstrom, 2010, Sweden [ ]	Case–control	18–24	60 (30/30)	RDC/TMD criteria I RDC/TMD criteria II Control group with no TMD	No statistically significant differences were found between any of the groups	Waking cortisol levels were not associated with symptoms of TMD and were not differentiated between the groups
Quartana et al., 2010, USA [ ]	Case–control	29.85	61 (39/22)	TMD patients Healthy controls	Pain index was not associated with cortisol levels	There was no association between markers of pain sensitivity and adrenocortical responses
Jones et al., 1997, Canada [ ]	Case–control	27.07	75 (36/39)	TMD group Control group	No significant differences found between TMD and control cortisol levels at baseline, but values were significantly higher in the TMD group at both 30 and 50 min	No relationship was found between psychological factors and hypersecretion of cortisol in TMD group

Study (Author, Year)	Saliva Collection	Salivary Cortisol Levels in Tests/Morning/Night	Salivary Cortisol Levels in Controls	Statistical Significance
Rosar et al., 2021 [ ]	Stimulated saliva Collection time: immediately after waking up and 30 min after waking up	Upon waking: 0.19 ± 0.21, After 30 min: 0.24 ± 0.28 μg/dL	Upon waking: 0.16 ± 0.13, After 30 min: 0.16 ± 0.09 μg/dL	No p > 0.05
Venkatesh et al., 2021 [ ]	Stimulated saliva Collection time: 9:30 a.m. to 10:00 a.m.	1.107 ± 0.17	0.696 ± 0.16	Yes p < 0.001
Goyal et al., 2020 [ ]	Unstimulated saliva Collection time: twice between 7.00 and 8.00 h, and again between 20.00 and 22.00 h	Morning: TMD with depression: 52.45 ± 18.62 TMD without depression: 20.35 ± 10.59 Evening: TMD with depression: 28.13 ± 10.88 TMD without depression: 12.33 ± 6.15	Morning: 12.85 ± 4.28 Evening: 8.51 ± 4.32	Yes p = 0.0001
D’Avilla, 2019 [ ]	Stimulated whole saliva was collected	G2: 7.45 ± 4.93, G3: 7.87 ± 3.52, G4: 4.35 ± 2.59 μg/dL	3.83 ± 2.72 μg/dL	Yes p < 0.05
Bozovic et al., 2018 [ ]	Stimulated saliva	2.8 µg/dL	0.6 µg/dL	Yes p < 0.001
Chinthakanan et al., 2018 [ ]	Unstimulated saliva Collection time: morning, over five minutes	29.78 ± 2.67 ng/ml	22.88 ± 1.38 ng/mL	Yes p < 0.05
Magri et al., 2018 [ ]	Unstimulated saliva Collection time: between 7 and 10 a.m.	Under 10 ng/mL: 5/7 Above 10 ng/mL: 15/14	Under 10 ng/mL: 6 Above 10 ng/mL: 17	Yes p < 0.05
Rosar et al., 2017 [ ]	Stimulated saliva Collection time: morning	Baseline: 5.9, T1: 2.6, T2: 2.5	Baseline: 4.9, T1: 4.4, T2: 4.3	Yes p < 0.05
Poorian et al., 2016 [ ]	Unstimulated saliva Collection time: between 9–11 a.m.	29.0240 ± 5.27835 ng/ml	8.8950 ± 9.58974 ng/mL	Yes p = 0.000
Tosato et al., 2015 [ ]	Unstimulated saliva Collection time: between 8 and 9 a.m.	Mild: 25.39, moderate: 116.7, severe: 250.1 µg/dL		Yes p < 0.05 for moderate and severe
Almeida et al., 2014 [ ]	Unstimulated saliva Collection time: between 9:00 and 9:25 a.m.	0.272 µg/dL	0.395 µg/dL	No p = 0.121
Nilsson and Dahlstrom, 2010 [ ]	Stimulated saliva	10.53 ± 5.05/12.61 ± 8.17 nmol/L	13.68 ± 9.96 nmol/L	No p > 0.05
Quartana et al., 2010 [ ]	Stimulated saliva Collection time: immediately prior to the start of pain testing, immediately following the pain testing procedures, and 20 min after the pain testing procedures	High PCS: BL: 0.8 Post-pain: 0.85 20 min after pain: 0.9 µg/dL	Low PCS: BL: 0.92 Post-pain: 0.75 20 min after pain: 0.7 µg/ml	No p > 0.05
Jones et al., 1997 [ ]	Unstimulated saliva Collection time: baseline (time, 0 min), peak secretion (time, 30 min), and after 20 min of rest (time, 50 min)	0 min: 6.41, 30 min: 11.96, 50 min: 10.28	0 min: 5.89, 30 min: 7.63, 50 min: 6.39	Yes p ˂ 0.01

Authors/Year	Randomization Process	Deviation from Intended Intervention	Missing Outcome Data	Measurement of the Outcome	Selection of the Reported Results	Overall Bias
Goyal, 2020 [ ]	Low	Low	Low	Some concern	Low	Low
Magri, 2017 [ ]	Low	Low	Low	Low	Low	Low
Rosar, 2017 [ ]	High	High	Low	High	High	High

Author, Year	Selection				Comparability	Exposure
	Is the Case Definition Adequate?	Representativeness of the Cases	Selection of Controls	Definition of Controls	Comparability of Cases and Controls Based on the Design or Analysis	Ascertainment of Exposure	Same Method of Ascertainment for Cases and Controls	Non-Response Rate	Risk of Bias
Almeida et al., 2014 [ ]	1	0	1	1	1	0	1	1	Medium (6)
Bozovic et al., 2018	1	1	1	1	1	1	1	1	Low (8)
Chinthakanan et al., 2018 [ ]	1	1	1	1	0	0	1	1	Medium (6)
Jones et al., 1997 [ ]	1	1	0	1	1	0	1	1	Medium (6)
Nilsson and Dahlstrom, 2010 [ ]	1	1	0	1	1	0	1	0	Medium (5)
Poorian et al., 2016 [ ]	1	0	0	0	0	0	1	1	High (3)
Quartana et al., 2010 [ ]	1	1	1	1	1	1	1	1	Low (8)

	Representativeness of the Sample	Sample Size	Non-Respondents	Ascertainment of the Exposure (Risk Factor)	The Subjects in Different Outcome Groups are Comparable, Based on the Study Design or Analysis; Confounding Factors are Controlled	Assessment of the Outcome	Statistical Test	Risk of Bias
D’Avilla, 2019 [ ]	1	1	1	2	1	1	1	Low (8)
Rosar et al., 2021 [ ]	1	1	1	2	1	1	1	Low (8)
Tosato et al., 2015 [ ]	1	1	1	1	1	2	1	Low (8)
Venkatesh et al., 2021 [ ]	1	1	0	1	0	1	1	Medium (5)

No. of Studies	Certainty assessment						Effect			Certainty	Importance
No. of Studies	Study Design	Risk of Bias	Inconsistency	Indirectness	Imprecision	Other Considerations	No. of Events	No. of Individuals	Rate (95% CI)	Certainty	Importance
3	Randomized trials	not serious	serious	serious	very serious	Strong association; all plausible residual confounding would reduce the demonstrated effect	We cannot provide examples extracted from our review since our review was not intentionally limited to a specific prognostic factor. Instead, our goal has been to explore salivary cortisol levels at different times of day that have been investigated to date as potential risks for the persistence of a variety of chronic pain conditions and their associated TMDs. However, this poor representation would happen, for instance, if we were interested in exploring the effects of various levels of salivary cortisol on types of TMD. The studies included were only investigating the prognostic effect of salivary cortisol on TMD at a specific age.			⨁⨁◯◯ Low	IMPORTANT
4	Observational studies (cross-sectional)	serious	serious	not serious	serious	All plausible residual confounding would suggest spurious effect, while no effect was observed	151	196		⨁⨁◯◯ Low	IMPORTANT
7	Observational studies (case–control)	serious	not serious	serious	serious	Publication bias strongly suspected; strong association; all plausible residual confounding would suggest spurious effect, while no effect was observed	When conducting comprehensive systematic reviews of the effects of cortisol levels on TMD incidence among young adults, authors reported that the evidence of increasing salivary cortisol as a prognostic factor for chronic TMD pain has serious limitations. This evidence comes from four studies, and all of them have a moderate risk of bias.			⨁⨁◯◯ Low	IMPORTANT

4. Discussion

4.1. association of cortisol and tmd, 4.2. evidence from randomized controlled trials, 4.3. evidence from case–control studies, 4.4. evidence from cross-sectional studies, 4.5. evidence from systematic reviews, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

Murphy, M.K.; MacBarb, R.F.; Wong, M.E. Temporomandibular Joint Disorders: A Review of Etiology, Clinical Management, and Tissue Engineering Strategies. Int. J. Oral Maxillofac. Implants 2013 , 28 , e393. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Giannakopoulos, N.N.; Keller, L.; Rammelsberg, P.; Kronmüller, K.-T.; Schmitter, M. Anxiety and depression in patients with chronic temporomandibular pain and in controls. J. Dent. 2010 , 38 , 369–376. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Tanaka, E.; Detamore, M.S.; Mercuri, L.G. Degenerative disorders of the temporomandibular joint: Etiology, diagnosis, and treatment. J. Dent. Res. 2008 , 87 , 296–307. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Liu, F.; Steinkeler, A. Epidemiology, diagnosis, and treatment of temporomandibular disorders. Dent. Clin. N. Am. 2013 , 57 , 465–479. [ Google Scholar ] [ CrossRef ] [ PubMed ]
McNeill, C. Management of temporomandibular disorders: Concepts and controversies. J. Prosthet. Dent. 1997 , 77 , 510–522. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Valesan, L.F.; Da-Cas, C.D.; Reus, J.C.; Denardin, A.C.S.; Garanhani, R.R.; Bonotto, D.; Januzzi, E.; de Souza, B.D.M. Prevalence of temporomandibular joint disorders: A systematic review and meta-analysis. Clin. Oral Investig. 2021 , 25 , 441–453. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Chisnoiu, A.M.; Picos, A.M.; Popa, S.; Chisnoiu, P.D.; Lascu, L.; Picos, A.; Chisnoiu, R. Factors involved in the etiology of temporomandibular disorders—A literature review. Med. Pharm. Rep. 2015 , 88 , 473–478. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Atsu, S.S.; Guner, S.; Palulu, N.; Bulut, A.C.; Kurkcuoglu, I. Oral parafunctions, personality traits, anxiety and their association with signs and symptoms of temporomandibular disorders in the adolescents. Afr. Health Sci. 2019 , 19 , 1801–1810. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Ohrbach, R.; Michelotti, A. The Role of Stress in the Etiology of Oral Parafunction and Myofascial Pain. Oral Maxillofac. Surg. Clin. 2018 , 30 , 369–379. [ Google Scholar ] [ CrossRef ]
Smith, S.M.; Vale, W.W. The role of the hypothalamic-pituitary-adrenal axis in neuroendocrine responses to stress. Dialogues Clin. Neurosci. 2006 , 8 , 383–395. [ Google Scholar ] [ CrossRef ]
De Leeuw, R.; Bertoli, E.; Schmidt, J.E.; Carlson, C.R. Prevalence of traumatic stressors in patients with temporomandibular disorders. J. Oral Maxillofac. Surg. 2005 , 63 , 42–50. [ Google Scholar ] [ CrossRef ]
Gameiro, G.H.; da Silva Andrade, A.; Nouer, D.F.; Ferraz de Arruda Veiga, M.C. How may stressful experiences contribute to the development of temporomandibular disorders? Clin. Oral Investig. 2006 , 10 , 261–268. [ Google Scholar ] [ CrossRef ]
Cui, Q.; Liu, D.; Xiang, B.; Sun, Q.; Fan, L.; He, M.; Wang, Y.; Zhu, X.; Ye, H. Morning Serum Cortisol as a Predictor for the HPA Axis Recovery in Cushing’s Disease. Int. J. Endocrinol. 2021 , 2021 , 4586229. [ Google Scholar ] [ CrossRef ]
El-Farhan, N.; Rees, D.A.; Evans, C. Measuring cortisol in serum, urine and saliva—Are our assays good enough? Ann. Clin. Biochem. 2017 , 54 , 308–322. [ Google Scholar ] [ CrossRef ]
Kirschbaum, C.; Hellhammer, D.H. Salivary cortisol in psychoneuroendocrine research: Recent developments and applications. Psychoneuroendocrinology 1994 , 19 , 313–333. [ Google Scholar ] [ CrossRef ]
Almeida, C.D.; Paludo, A.; Stechman-Eto, J.; Amenábar, J.M. Saliva cortisol levels and depression in individuals with temporomandibular disorder: Preliminary study. Rev. Dor 2014 , 15 , 169–172. [ Google Scholar ] [ CrossRef ]
D’Avilla, B.M.; Pimenta, M.C.; Furletti, V.F.; Vedovello Filho, M.; Venezian, G.C.; Custodio, W. Comorbidity of TMD and malocclusion: Impacts on quality of life, masticatory capacity and emotional features. Braz. J. Oral Sci. 2019 , 18 , e191679. [ Google Scholar ] [ CrossRef ]
Kobayashi, F.Y.; Gavião, M.B.D.; Marquezin, M.C.S.; Fonseca, F.L.A.; Montes, A.B.M.; Barbosa, T.S.; Castelo, P.M. Salivary stress biomarkers and anxiety symptoms in children with and without temporomandibular disorders. Braz. Oral Res. 2017 , 31 , e78. [ Google Scholar ] [ CrossRef ]
Suprajith, T.; Wali, A.; Jain, A.; Patil, K.; Mahale, P.; Niranjan, V. Effect of Temporomandibular Disorders on Cortisol Concentration in the Body and Treatment with Occlusal Equilibrium. J. Pharm. Bioallied Sci. 2022 , 14 , S483–S485. [ Google Scholar ] [ CrossRef ]
Fritzen, V.M.; Colonetti, T.; Cruz, M.V.; Ferraz, S.D.; Ceretta, L.; Tuon, L.; Da Rosa, M.I.; Ceretta, R.A. Levels of Salivary Cortisol in Adults and Children with Bruxism Diagnosis: A Systematic Review and Meta-Analysis. J. Evid.-Based Dent. Pract. 2022 , 22 , 101634. [ Google Scholar ] [ CrossRef ]
Lu, L.; Yang, B.; Li, M.; Bao, B. Salivary cortisol levels and temporomandibular disorders—A systematic review and meta-analysis of 13 case-control studies. Trop. J. Pharm. Res. 2022 , 21 , 1341–1349. [ Google Scholar ] [ CrossRef ]
Liberati, A.; Altman, D.G.; Tetzlaff, J.; Mulrow, C.; Gotzsche, P.C.; Ioannidis, J.P.; Clarke, M.; Devereaux, P.J.; Kleijnen, J.; Moher, D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLoS Med. 2009 , 6 , e1000100. [ Google Scholar ] [ CrossRef ]
Huguet, A.; Hayden, J.A.; Stinson, J.; McGrath, P.J.; Chambers, C.T.; Tougas, M.E.; Wozney, L. Judging the quality of evidence in reviews of prognostic factor research: Adapting the GRADE framework. Syst. Rev. 2013 , 2 , 71. [ Google Scholar ] [ CrossRef ]
Sterne, J.A.C.; Savovic, J.; Page, M.J.; Elbers, R.G.; Blencowe, N.S.; Boutron, I.; Cates, C.J.; Cheng, H.Y.; Corbett, M.S.; Eldridge, S.M.; et al. RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 2019 , 366 , l4898. [ Google Scholar ] [ CrossRef ]
Stang, A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur. J. Epidemiol. 2010 , 25 , 603–605. [ Google Scholar ] [ CrossRef ]
Dubey, V.P.; Kievisiene, J.; Rauckiene-Michealsson, A.; Norkiene, S.; Razbadauskas, A.; Agostinis-Sobrinho, C. Bullying and Health Related Quality of Life among Adolescents-A Systematic Review. Children 2022 , 9 , 766. [ Google Scholar ] [ CrossRef ]
Rosar, J.V.; Barbosa, T.S.; Dias, I.O.V.; Kobayashi, F.Y.; Costa, Y.M.; Gaviao, M.B.D.; Bonjardim, L.R.; Castelo, P.M. Effect of interocclusal appliance on bite force, sleep quality, salivary cortisol levels and signs and symptoms of temporomandibular dysfunction in adults with sleep bruxism. Arch. Oral Biol. 2017 , 82 , 62–70. [ Google Scholar ] [ CrossRef ]
Goyal, G.; Gupta, D.; Pallagatti, S. Salivary Cortisol Could Be a Promising Tool in the Diagnosis of Temporomandibular Disorders Associated with Psychological Factors. J. Indian Acad. Oral Med. Radiol. 2021 , 32 , 354–359. [ Google Scholar ] [ CrossRef ]
Magri, L.V.; Carvalho, V.A.; Rodrigues, F.C.C.; Bataglion, C.; Leite-Panissi, C.R.A. Non-specific effects and clusters of women with painful TMD responders and non-responders to LLLT: Double-blind randomized clinical trial. Lasers Med. Sci. 2018 , 33 , 385–392. [ Google Scholar ] [ CrossRef ]
Rosar, J.V.; Marquezin, M.C.S.; Pizzolato, A.S.; Kobayashi, F.Y.; Bussadori, S.K.; Pereira, L.J.; Castelo, P.M. Identifying predictive factors for sleep bruxism severity using clinical and polysomnographic parameters: A principal component analysis. J. Clin. Sleep Med. 2021 , 17 , 949–956. [ Google Scholar ] [ CrossRef ]
de Paiva Tosato, J.; Caria, P.H.; de Paula Gomes, C.A.; Berzin, F.; Politti, F.; de Oliveira Gonzalez, T.; Biasotto-Gonzalez, D.A. Correlation of stress and muscle activity of patients with different degrees of temporomandibular disorder. J. Phys. Ther. Sci. 2015 , 27 , 1227–1231. [ Google Scholar ] [ CrossRef ]
Venkatesh, S.B.; Shetty, S.S.; Kamath, V. Prevalence of temporomandibular disorders and its correlation with stress and salivary cortisol levels among students. Pesqui. Bras. Odontopediatria Clín. Integr. 2021 , 21 , e0120. [ Google Scholar ] [ CrossRef ]
Božović, Đ.; Ivković, N.; Račić, M.; Ristić, S. Salivary cortisol responses to acute stress in students with myofascial pain. Srpski Arhiv za Celokupno Lekarstvo 2018 , 146 , 20–25. [ Google Scholar ] [ CrossRef ]
Chinthakanan, S.; Laosuwan, K.; Boonyawong, P.; Kumfu, S.; Chattipakorn, N.; Chattipakorn, S.C. Reduced heart rate variability and increased saliva cortisol in patients with TMD. Arch. Oral Biol. 2018 , 90 , 125–129. [ Google Scholar ] [ CrossRef ]
Jones, D.A.; Rollman, G.B.; Brooke, R.I. The cortisol response to psychological stress in temporomandibular dysfunction. Pain 1997 , 72 , 171–182. [ Google Scholar ] [ CrossRef ]
Nilsson, A.M.; Dahlstrom, L. Perceived symptoms of psychological distress and salivary cortisol levels in young women with muscular or disk-related temporomandibular disorders. Acta Odontol. Scand. 2010 , 68 , 284–288. [ Google Scholar ] [ CrossRef ]
Poorian, B.; Dehghani, N.; Bemanali, M. Comparison of Salivary Cortisol Level in Temporomandibular Disorders and Healthy People. Int. J. Rev. Life Sci. 2015 , 5 , 1105–1113. [ Google Scholar ]
Quartana, P.J.; Buenaver, L.F.; Edwards, R.R.; Klick, B.; Haythornthwaite, J.A.; Smith, M.T. Pain catastrophizing and salivary cortisol responses to laboratory pain testing in temporomandibular disorder and healthy participants. J. Pain 2010 , 11 , 186–194. [ Google Scholar ] [ CrossRef ]
Anna, S.; Joanna, K.; Teresa, S.; Maria, G.; Aneta, W. The influence of emotional state on the masticatory muscles function in the group of young healthy adults. Biomed. Res. Int. 2015 , 2015 , 174013. [ Google Scholar ] [ CrossRef ]
Apkarian, A.V.; Baliki, M.N.; Geha, P.Y. Towards a theory of chronic pain. Prog. Neurobiol. 2009 , 87 , 81–97. [ Google Scholar ] [ CrossRef ]
Hannibal, K.E.; Bishop, M.D. Chronic stress, cortisol dysfunction, and pain: A psychoneuroendocrine rationale for stress management in pain rehabilitation. Phys. Ther. 2014 , 94 , 1816–1825. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Scrivani, S.J.; Keith, D.A.; Kaban, L.B. Temporomandibular disorders. N. Engl. J. Med. 2008 , 359 , 2693–2705. [ Google Scholar ] [ CrossRef ] [ PubMed ]
Jasim, H.; Louca, S.; Christidis, N.; Ernberg, M. Salivary cortisol and psychological factors in women with chronic and acute oro-facial pain. J. Oral Rehabil. 2014 , 41 , 122–132. [ Google Scholar ] [ CrossRef ]
Nadendla, L.K.; Meduri, V.; Paramkusam, G.; Pachava, K.R. Evaluation of salivary cortisol and anxiety levels in myofascial pain dysfunction syndrome. Korean J. Pain 2014 , 27 , 30–34. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

PubMed	(hydrocortisone) “[MeSH Terms] OR “hydrocortisone”[All Fields] OR “cortisol”[All Fields]) AND (“Temporomandibular disorder”[MeSH Terms] OR “TMD”[All Fields]) OR (“temporomandibular disfunction”[MeSH Terms] OR (“Facial muscle pain”[All Fields] AND young Adults [All Fields].
Scopus	(TITLE-ABS-KEY (“craniomandibular disorder” OR “temporomandibular joint disorder” OR “temporomandibular disorder” OR tmjd OR tmd OR “tmj disorder” OR ((facial OR jaw OR orofacial OR craniofacial OR trigem) AND pain))) AND (TITLE-ABS-KEY (pcs OR “Salivary cortisol” OR Hydrocortisone OR cortisol AND (Young adults))))
Web of science	cortisol* OR hydrocortisone* AND Temporomandibular disorder* OR TMD* AND Young adults.
Google scholar	(cortisol OR Salivary cortisol AND Temporomandibular disorder OR TMD AND Young Adults).

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

AlSahman, L.; AlBagieh, H.; AlSahman, R. Is There a Relationship between Salivary Cortisol and Temporomandibular Disorder: A Systematic Review. Diagnostics 2024 , 14 , 1435. https://doi.org/10.3390/diagnostics14131435

AlSahman L, AlBagieh H, AlSahman R. Is There a Relationship between Salivary Cortisol and Temporomandibular Disorder: A Systematic Review. Diagnostics . 2024; 14(13):1435. https://doi.org/10.3390/diagnostics14131435

AlSahman, Lujain, Hamad AlBagieh, and Roba AlSahman. 2024. "Is There a Relationship between Salivary Cortisol and Temporomandibular Disorder: A Systematic Review" Diagnostics 14, no. 13: 1435. https://doi.org/10.3390/diagnostics14131435

Article Metrics

Further information, mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

Research article
Open access
Published: 24 June 2024

Heterologous versus homologous COVID-19 booster vaccinations for adults: systematic review with meta-analysis and trial sequential analysis of randomised clinical trials

Mark Aninakwah Asante 1 na1 ,
Martin Ekholm Michelsen 1 na1 ,
Mithuna Mille Balakumar 1 ,
Buddheera Kumburegama 1 ,
Amin Sharifan 2 ,
Allan Randrup Thomsen 3 ,
Steven Kwasi Korang 1 , 4 ,
Christian Gluud 1 , 5 &
Sonia Menon 1 , 6

BMC Medicine volume 22 , Article number: 263 ( 2024 ) Cite this article

540 Accesses

15 Altmetric

Metrics details

To combat coronavirus disease 2019 (COVID-19), booster vaccination strategies are important. However, the optimal administration of booster vaccine platforms remains unclear. Herein, we aimed to assess the benefits and harms of three or four heterologous versus homologous booster regimens.

From November 3 2022 to December 21, 2023, we searched five databases for randomised clinical trials (RCT). Reviewers screened, extracted data, and assessed bias risks independently with the Cochrane risk-of-bias 2 tool. We conducted meta-analyses and trial sequential analyses (TSA) on our primary (all-cause mortality; laboratory confirmed symptomatic and severe COVID-19; serious adverse events [SAE]) and secondary outcomes (quality of life [QoL]; adverse events [AE] considered non-serious). We assessed the evidence with the GRADE approach. Subgroup analyses were stratified for trials before and after 2023, three or four boosters, immunocompromised status, follow-up, risk of bias, heterologous booster vaccine platforms, and valency of booster.

We included 29 RCTs with 43 comparisons (12,538 participants). Heterologous booster regimens may not reduce the relative risk (RR) of all-cause mortality (11 trials; RR 0.86; 95% CI 0.33 to 2.26; I 2 0%; very low certainty evidence); laboratory-confirmed symptomatic COVID-19 (14 trials; RR 0.95; 95% CI 0.72 to 1.25; I 2 0%; very low certainty); or severe COVID-19 (10 trials; RR 0.51; 95% CI 0.20 to 1.33; I 2 0%; very low certainty). For safety outcomes, heterologous booster regimens may have no effect on SAE (27 trials; RR 1.15; 95% CI 0.68 to 1.95; I 2 0%; very low certainty) but may raise AE considered non-serious (20 trials; RR 1.19; 95% CI 1.08 to 1.32; I 2 64.4%; very low certainty). No data on QoL was available. Our TSAs showed that the cumulative Z curves did not reach futility for any outcome.

Conclusions

With our current sample sizes, we were not able to infer differences of effects for any outcomes, but heterologous booster regimens seem to cause more non-serious AE. Furthermore, more robust data are instrumental to update this review.

Peer Review reports

Severe respiratory syndrome coronavirus 2 (SARS-CoV-2) is the pathogen that causes coronavirus disease (COVID-19). Despite the official end of the public health emergency declaration on 5 May 2023, SARS-CoV-2 continues to infect people across the world, with vaccination remaining one of the most important protective measures against COVID-19 [ 1 , 2 ].

Between 31 July and 27 August 2023, more than 1.4 million new COVID-19 patients and over 1800 deaths were reported globally underscoring the need for ongoing close monitoring of circulating SARS-CoV-2 variants closely [ 1 ]. Presently, a number of variants are tracked by WHO, including two variants of interest (VOIs) (XBB.1.5 and XBB.1.16) and a number of variants under monitoring (VUMs) [ 1 ]. Significant progress in the handling of the COVID-19 epidemic has already been made as nearly every country has implemented vaccination policies, which has resulted in major reductions in the occurrence of severe disease, hospitalisations, and mortality [ 2 ].

Despite fewer severely diseased and fewer deaths worldwide today, there are concerns about reduced protection because of waning immunity and the appearance of newly emerging variants [ 3 ]. Currently, the Strategic Advisory Group of Experts on Immunisation recommends healthy adults over the age of 18 years are to receive one booster dose after primary vaccine series, whilst individuals with the greater risk of severe disease and death (older adults, pregnant persons, and people with immunocompromised conditions) are recommended an additional booster dose [ 4 ].

Using heterologous vaccine platforms can be an alternative strategy to homologous vaccine platforms to maximise booster vaccine impact in the event of limited supplies. It is unclear whether a heterologous boosting regimen may provide higher vaccine effectiveness than homologous booster vaccines. Two meta-analyses including randomised clinical trials and observational studies suggest that heterologous booster doses have a higher protection against symptomatic COVID-19 and severe COVID-19 compared with or to homologous booster doses [ 5 , 6 ] whilst a ‘living meta-analysis’ also including randomised clinical trials and observational studies does not [ 7 ].

The objective of this systematic review is to compare the vaccine benefits and harms between three or four dose heterologous boosters using different vaccine platforms or intra-platform variations versus homologous booster regimens in randomised trials only to help inform public health policies.

Recognising the needs of COVID-19 vaccine research and the identification of trials on heterologous versus homologous booster regimens as an area of public health interest necessitating evidence synthesis, we performed this specific review of pairwise comparison of heterologous versus homologous boosters in randomised clinical trials. This was performed within the framework of our living systematic review, the methodology of which is thoroughly discussed elsewhere [ 8 ], and the protocol registered in PROSPERO (CRD42020178787). This systematic review was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis [ 9 ] (Additional file: PRISMA checklist ) and the implementation of this review followed the recommended procedures as specified in the Cochrane Handbook of Systematic Reviews of Interventions [ 10 ].

Search strategy and trial inclusion criteria

This updated review follows a two-step approach. As for the first living systematic review, the literature searches were conducted on a biweekly basis, from 3 November 2022 to 21 December 2023 using Medline, Cochrane Central Register of Controlled Trials, Embase, Latin American and Caribbean Health Sciences Literature, and Science Citation Index Expanded to identify newly published trials following the initial search strategy and eligibility criteria (for more ample information on the search strategy and study inclusion, please refer to the protocol (Additional file: Additional search Strategy]). After identifying eligible randomised clinical trials for our original research on the efficacy of all COVID-19 vaccines in relation to all-cause mortality, safety, and vaccine efficacy, we employed a specific search strategy tailored to our present research question (Additional file: Additional search strategy). As a quality control measure, we also conducted a snowball search to identify any potential missed trials [ 11 ]. All randomised clinical trials reporting on a third or fourth heterologous booster vaccine versus either a third or fourth homologous booster vaccine were included. In instances where it was not possible to determine whether the intervention arm used a heterologous or homologous booster vaccine, and no clarification was provided by the authors, the trial was excluded. Also, only full booster doses between both arms were compared, in instances when boosters between both arms only compared half doses to full doses, the trial was excluded. Trials with mixed primary series in the heterologous arm were excluded. Furthermore, trials reporting exclusively on immunogenicity, along with trials comparing different types of heterologous booster vaccines or heterologous third booster to a placebo were also excluded. Trials that included open-label cohorts with no randomisation of the participants were excluded.

Data analysis

The vaccine efficacy outcomes included the primary outcomes, all-cause mortality, prevention of laboratory-confirmed symptomatic COVID-19, severe symptoms associated with COVID-19, and serious adverse events (SAE) [ 8 ]. Whenever participants were noted to have (laboratory-confirmed) COVID-19 symptoms, we classified it as symptomatic COVID-19. Conversely, if participants were hospitalised due to severe COVID-19 symptoms, we defined it as severe COVID-19. Secondary outcomes were health-related quality of life and adverse events (AE) considered not serious [ 8 ]. We used the trial results reported at maximum follow-up for each specific abovementioned outcome and used intention-to-treat data if provided by the trialist.

Data extraction and risk of bias assessment

Two independent authors conducted the screening, data extraction, quality assessment, and GRADE assessment for each eligible trial following the Cochrane risk of bias tool—version 2 and the procedure described in our protocol. If three domains were assigned a ‘some concern’ assessment, then the trial was graded at ‘high risk of bias’. Any discrepancies were resolved by consensus and authors were contacted to clarify uncertainties and provide additional context, including available data stratified by older adults.

Statistical synthesis

We performed meta-analysis using STATA 17 for Windows (StataCorp, College Station, TX, USA, 2021) and analysed data with the meta command for meta-analysis. For the trial sequential analysis (TSA), we used version 0.9.5.10 beta (TSA 2017) [ 12 ]. To quantify the strength of associations between booster vaccines and vaccine efficacy and safety outcomes, we employed relative risk (RR). The risk ratio was computed by dividing the risk observed in the heterologous vaccine regimen group by the risk in the homologous vaccine regimen group, and the 95% confidence intervals (CI) for the risk ratio was used to determine the precision of the estimated associations. With a view to avoiding attributing excessive weight to the control groups in the meta-analysis, we divided both the numerator and the denominator of the control group by the number of intervention groups whenever the same control group was used in a trial to compare different intervention groups. To account for potential heterogeneity amongst the trials, random-effects DerSimonian and Laird models were applied [ 13 , 14 ]. In addition, the fixed-effect meta-analysis (Mantel–Haenszel method) was assessed separately and the most conservative point estimate of the two reported [ 15 , 16 ]. We also post hoc applied Peto’s odds ratio (OR) due to very few outcomes in some comparisons.

Assessment of heterogeneity within and between study groups was conducted using the Cochrane Q test, with a significance level of p < 0.1 indicating the presence of heterogeneity [ 10 ]. The I 2 statistic, as described by Higgins and Thompson was employed to estimate the percentage of observed between-study variability due to heterogeneity, as opposed to chance [ 17 ]. This statistic ranges from 0 to 100%, with values of 0 to 40% representing moderate heterogeneity, 30 to 60% moderate heterogeneity, 50 to 90% substantial heterogeneity, and 75 to 100% considerable heterogeneity [ 10 ].

Furthermore, we performed a subgroup analysis based on the risk of biases to examine the effect of potential biases on the risk ratio. The variable was categorised as low risk of bias compared to some concerns/high risk of bias, allowing us to discern any differential effects on the overall results. Moreover, we conducted subgroup analyses based on the follow-up time: studies with follow-up periods of 3 months and under were compared to those with follow-up periods of above 3 months. Additionally, we compared vaccine regimens with three doses against those with four doses to explore differences in their risk ratios. As different vaccine booster platforms use distinct mechanisms to elicit immune responses [ 18 ], which may lead to varying efficacy and safety profiles [ 19 ], we also conducted a subgroup analysis to compare differences in risk ratios between boosters with different vaccine platforms, including inactivated, protein-based, viral vectored, and mRNA-based boosters. Furthermore, we investigated the variation in risk ratios for vaccine efficacy outcomes between trials from 2023 and those from 2022, thereby allowing us to consider the potential influence of the predominance of XBB subvariants towards the end of 2022 and 2023. Also, we conducted a subgroup analysis by immunocompromised status as immunocompromised individuals may not have a robust immune response to COVID-19 vaccines compared to those without an immunocompromised condition [ 20 ]. Initially, our plan was to conduct a subgroup analysis by categorising adults into younger and older age groups; however, we were constrained by the absence of disaggregated data. Additionally, as an increase in inoculation interval times may impact vaccine efficacy and possibly safety outcomes [ 21 ], we aimed to investigate the impact of different inoculation interval times on vaccine efficacy and safety outcomes using a 12-week cutoff [ 22 ]. Nevertheless, inconsistent reporting and a lack of interpretable data due to large ranges of inoculation intervals prevented us from conducting these planned subgroup analyses. To capture more recent trials comparing vaccine valency, monovalent vaccine boosters to multivalent vaccine boosters (bivalent and tetravalent vaccine boosters) using heterologous and homologous vaccine boosters, we have also conducted a subgroup analysis. By conducting these subgroup analyses, we aimed to assess the differential effect on risk ratios and their associated heterogeneity.

We conducted the TSAs to control risks of type I and type II errors [ 23 , 24 , 25 ]. To assess publication bias, a visual inspection of the funnel plots was conducted and the Egger statistical test performed when an outcome had at least 10 trials [ 10 ].

Summary of findings and assessment of certainty

We used the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) profiler Guideline Development Tool to create the summary of findings tables (GRADEpro GDT https://www.gradepro.org/ ). We created a summary of findings tables including each of the prespecified outcomes (all-cause mortality, vaccine efficacy, serious adverse events, health-related quality of life, and non-serious adverse events) (Table 1 : GRADE assessment). We used the five GRADE considerations (bias risk of the trials, consistency of effect, imprecision, indirectness, and publication bias). We assessed imprecision using trial sequential analysis [ 8 , 26 , 27 ].

Trial characteristics

Out of 29,145 abstracts screened by the initial search, 28,044 were excluded after abstract screening. Following a full-text review of 1,101 studies, 601 were excluded based on our inclusion and exclusion criteria. Ultimately, 500 trials met our criteria for the initial research question, of which 29 trials conducted in Europe, North America, Asia, and Latin America were retained in the final analysis of this specific research question. See the PRISMA flow diagram for more details about reasons for exclusion (Additional file: PRISMA flow chart).

In total, 12,538 participants provided data for our predefined meta-analyses. All participants were adults (≥ 18 years) and all trials included older adults (either ≥ 60 or ≥ 65 years) except for four trials [ 28 , 29 , 30 , 31 ] while five trials exclusively included immunocompromised participants [ 32 , 33 , 34 , 35 , 36 ]. None of the trials included pregnant women. One trial exclusively included healthy older adults (≥ 60 years) [ 37 ]. Most trials assessed a third dose heterologous booster vaccine compared with a third dose homologous booster vaccine [ 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 ] while four trials compared a fourth heterologous booster with a fourth homologous booster [ 47 , 54 , 55 , 56 ]. The included heterologous booster vaccines encompassed viral-vectored, mRNA, protein subunit, or inactivated virus platforms (Table 2 : Trials’ characteristics). Follow-up of participants varied from 7 to 365 days after randomisation for all outcomes. Inoculation intervals between the 2nd and 3rd dose, when reported, ranged from 8 to 43 weeks and 28 to 37 weeks between the 3rd dose and 4th dose (Table 2 : Trials’ characteristics).

Primary outcomes

All-cause mortality.

The 11 trials ( N = 5883) which reported on all-cause mortality observed one death in an immunocompromised participant in the heterologous group because of a SAE (myocardial infarction) (Fig. 1 ). Five trials (45%) were assessed as having some concerns regarding bias (Additional file: FigS 24) and 5 trials (45%) followed participants 90 days or more (Additional file: FigS 20).

Heterologous versus homologous vaccine booster regimens: all-cause mortality

The meta-analysis suggested that the heterologous booster vaccines may have no effect on reducing all-cause mortality compared with homologous booster vaccines (RR 0.86; 95% CI 0.33 to 2.26; I 2 0.0%; very low certainty evidence), with comparable fixed-model and Peto OR effect estimates (Additional file: Table S3).

The trial sequential analysis (Additional file: FigS1) showed that the cumulative Z -curve did not cross the conventional boundaries after inclusion of eleven trials, nor reached the futility boundaries, indicating a need for more trials. It is very uncertain that subgroup analyses across heterologous booster vaccine platforms (Additional file: FigS12), number of doses (Additional file: FigS17), follow-up time (Additional file: FigS20), risk of bias (Additional file: FigS24), health status (Additional file: FigS27), and trials published before and in 2023 (Additional file: FigS31) have no effect in reducing all-cause mortality.

Laboratory-confirmed symptomatic COVID-19

All trials either used reverse transcription polymerase chain reaction (RT-PCR) or similar laboratory tests for COVID-19 exclusively for those reporting symptoms. Thus, we were only able to report on symptomatic participants of COVID-19 and not all participants with confirmed COVID-19 as stated in our protocol. Fourteen trials ( N = 5677) reported on symptomatic COVID-19 with 13 trials (Fig. 2 ) assessed as having some concerns for Domain 4 (measurement of the outcome) and one being downgraded to high risk of bias due to three domains being attributed some concerns. Seven trials (50%) followed participants 90 days or more (Additional file: FigS21). The pooled RR suggested that the heterologous booster vaccines may not have effect on risk of confirmed symptomatic COVID-19 compared with homologous booster vaccines (RR 0.95; 95% CI 0.72 to 1.25; I 2 0.0%; very low certainty evidence), which was further supported by estimates from the fixed-effect model and the Peto OR (Additional file: Table S3). The TSA showed that the cumulative Z -curve did not cross the conventional boundaries after inclusion of the fourteen trials, nor reached the futility boundaries, indicating a need for more trials (Additional file: FigS2).

Heterologous versus homologous vaccine booster regimens: laboratory-confirmed symptomatic COVID-19

As authors did not report the methodology of how symptomatic COVID-19 participants were diagnosed, this was reflected by assigning some concerns in Domain 4 (measurement of the outcome), therefore precluding us from performing a subgroup analysis by risk of bias. It is uncertain that subgroup analyses according to heterologous booster vaccine platforms (Additional file: Fig S13), variations in follow-up duration (Additional file: Fig S21), health status (Additional file: Fig S28), by pre-2023 and in 2023 (Additional file: Fig S32), and according to vaccine booster valency (Additional file: Fig S34), have no effect in reducing laboratory-confirmed symptomatic COVID-19 events between the two intervention groups.

Laboratory-confirmed severe COVID-19

Ten trials ( N = 4494) assessed severe disease associated with laboratory-confirmed COVID-19 (Fig. 3 ), with all trials having some concerns for Domain 4 (measurement of the outcome). Only two participants with severe COVID-19 were reported, which occurred in the homologous booster group. Six trials (60%) followed participants 90 days or more ( Additional file: Fig S22).

Heterologous versus homologous vaccine booster regimens: severe COVID-19 disease

The pooled random-effects model estimates that heterologous booster doses may have no effect on reducing severe COVID-19 symptoms versus homologous booster doses (RR 0.51; 95% CI 0.20 to 1.33; I 2 0.0%; very low certainty), with comparable estimates from the fixed-effect model and Peto OR (Additional file: Table S3). The TSA underscored that the required meta-analytic sample size has not been met, thereby preventing the establishment of conclusive evidence (Additional file: FigS3). Therefore, additional trials are imperative to substantiate the impact of a heterologous vaccine regimen on laboratory-confirmed severe COVID-19 participants.

As trial authors did not report the methodology of how severe COVID participants were diagnosed, all trials measuring this outcome were assessed as having some concerns for Domain 4 (measurement of the outcome), therefore precluding us from performing a subgroup analysis by risk of bias. It is very uncertain that subgroup analyses across heterologous booster vaccine platforms (Additional file: FigS14), variations in follow-up duration (Additional file: FigS22), pre-2023 and in 2023 (Additional file: FigS33), and according to vaccine booster valency (Additional file: FigS35) have any effect in reducing laboratory-confirmed severe COVID-19 between the subgroups.

Serious adverse events

Twenty-seven trials ( N = 11,384) reported serious adverse events (SAE) when assessing the safety profile of the heterologous versus homologous booster vaccines (Fig. 4 ), of which 13 of trials (48%) were assessed as having one or more concerns across domains of which three trials at high risk of bias. Fourteen trials (52%) followed participants 90 days or longer .

Heterologous versus homologous vaccine booster regimens: serious adverse events

The overall estimates suggest that there may be no difference on the risk for serious adverse events between heterologous booster vaccines versus homologous booster vaccines (RR 1.15; 95% CI 0.68 to 1.95; I 2 0.0%; very low certainty evidence), with comparable estimates from the fixed-effect model and Peto OR (Additional file: Table S3). The TSA reveals that the cumulative number of participants remains suboptimal, indicating the insufficiency of the accrued sample size (Additional file: FigS4). Therefore, additional trials are necessary to ascertain the impact of a heterologous vaccine regimen on serious adverse events. It is very uncertain that subgroup analyses across heterologous booster vaccine platforms (Additional file: FigS15), different doses (Additional file: FigS18), variations in follow-up duration (Additional file: FigS23), risk of bias (Additional file: FigS25), health status (Additional file: FigS29), and according to vaccine booster valency (Additional file: FigS36) may have any effect on SAE between the subgroups.

Secondary outcomes

Quality of life

None of the included trials reported on health-related QoL.

Adverse events considered not serious

Twenty trials ( N = 10,008) reported on AE considered non-serious when assessing the safety profile for booster vaccines (Fig. 5 ), of which ten trials (50%) were considered as having one or more concerns across domains of which two were at high risks of bias. Follow-up for all trials was less than 90 days.

Heterologous versus homologous vaccine booster regimens: non-serious adverse events

Most common types of AE considered non serious were fatigue, fever, injection site pain, redness, muscle pain, and headache. The overall pooled RR suggested that there may be a higher risk of AE considered non-serious by 21% in the heterologous vaccination group versus the homologous vaccination group (RR 1.19; 95% CI 1.08 to 1.32; I 2 64.4%; very low certainty), with concurring estimates with the fixed-effect model and Peto OR (Additional file: Table S3). The TSA showed that the cumulative Z -curve did not intersect the threshold indicating potential harm nor potential benefit associated with heterologous vaccines after incorporating the 20 trials (Additional file: FigS5).

Subgroup analyses based on different doses (Additional file: Fig S19), risk of bias (Additional file: Fig S26), and health status (Additional file: Fig S30) did not impact the pooled relative risk (RR) or reduce heterogeneity. The lack of difference in effect due to different doses on adverse events (AEs) considered non-serious remains very uncertain across subgroups. Furthermore, the evidence for differential higher risks of non-serious AE with protein-based vaccine boosters, viral-vectored booster platforms, and mRNA vaccine booster platforms remain very uncertain due to an even higher risk of imprecision (RR 1.13; 95% CI 1.00 to 1.29; I 2 : 62.5%), (RR 1.51; 95% CI 1.16 to 1.97; I 2 : 56.2%,) and (RR 1.25; 95% CI 1.00 to 1.56), respectively (Additional file: FigS16).

Publication bias

No asymmetry for all-cause mortality, symptomatic COVID-19, severe COVID-19, and SAE (Additional file: Fig S37-40) were observed in the funnel plots, providing evidence against publication bias, which was further corroborated by Egger’s tests showing no significant evidence of publication bias. For adverse events considered non-serious, despite the presence of slight asymmetry in the funnel plot for the outcome (Additional file: FigS44), the significant result from the Egger’s test ( P : 0.02) suggests evidence of publication bias for non-serious adverse events. It is noteworthy that substantial heterogeneity among the included trials could potentially account for the observed asymmetry, introducing some uncertainty into our findings.

In this updated living vaccine project valid until the end of 2023, we focused on gathering evidence from 29 trials comparing heterologous-based booster versus homologous-based booster regimens, of which two compared multivalent versus bivalent boosters. We found no evidence of different effects on mortality, laboratory-confirmed symptomatic COVID-19, laboratory-confirmed severe COVID-19, or SAE. Our TSAs revealed that the accrued sample size was suboptimal to make any robust conclusions of any difference of effects on these outcomes. We found no data on QoL. Nevertheless, we found that heterologous booster regimens may increase the occurrence of AE considered non-serious, but more data will be required to confirm this finding.

Heterogeneity was only encountered assessing AE considered non-serious. Notably, for this outcome, subgroup analyses across vaccine platforms, doses, risk of bias, and health status of participants did not reduce the high level of heterogeneity, which remained above 50%. Due to limited sample sizes, we cannot confidently determine significant differences or lack thereof for all outcomes.

Thus, at this juncture, the very low certainty of evidence yielded from this systematic review does not allow an assessment of beneficial and harmful effects of combining the two different types of vaccine platform, thereby providing limited evidence supporting any firm conclusions. Thus, it would be premature to infer whether lack of statistical significance is due to insufficient sample size or due to no differences between heterologous and homologous booster regimens.

To our knowledge, no other systematic review comprising only randomised clinical trials exists, thus hindering direct comparisons to be made. Three meta-analyses were published between April and August 2022, with the bulk of evidence emanating from observational studies [ 5 , 6 , 7 ]. Deng et al. [ 6 ] reported higher vaccine effectiveness for symptomatic COVID-19 and severe symptoms associated with COVID-19 with heterologous boosters (56.8% compared to 17.3% and 97.4% compared to 93.4%, respectively) [ 6 ]. Conversely, Au et al. (2022) found comparable effectiveness between heterologous and homologous three-dose regimens in preventing COVID-19 symptomatic and severe infections [ 7 ]. Regarding safety outcomes, our findings align with Deng et al. [ 6 ], who reported higher odds for adverse events considered non-serious in the heterologous booster group, in disagreement with Cheng et al. [ 5 ] who reported a higher incidence of total adverse events in the homologous group booster group [ 5 ]. However, these discrepancies may be attributed to confounding factors, including location-based differences in vaccination strategies.

Strengths and limitations

Strengths related to our methodology include the use of five biomedical databases drawing from a combination of approaches to increase the likelihood of capturing all eligible trials. Second, we only included randomised clinical trials. Third, we employed our general search strategy as defined by the protocol followed by a specific search strategy tailored to our specific research question, which was later complemented with the use of the snowballing method. Fourth, we conducted TSAs to control type I and type II errors and strengthen our assessment of the imprecision domain in GRADE.

Our eligible trials have several strengths. Firstly, the inclusion of participants from diverse geographical regions supports the generalisation of results, increasing the applicability of our findings to broader populations. Furthermore, by utilising various vaccine regimen combinations in the heterologous arm, compared with different homologous vaccine regimens, we further enhance the generalisation of our results in addressing our broad research question, whether heterologous regimens are more likely to improve vaccine efficacy and safety.

However, interpretation of our findings warrants caution and cognisance of certain methodological limitations, as reflected in the very low certainty we have in the evidence, largely attributable to the non-negligible percentage of RCT not being free of potential biases, imprecision, and heterogeneity. Secondly, we were unable to adequately assess the quality of RCT reporting on vaccine efficiency as none of the eligible trials reporting on these outcomes described the methodology for assessing this efficiency. In addition, whilst including trials from different geographical regions with varying patterns of sublineage predominance, vaccination combinations, and intervals between prime and boost doses using different vaccine regimens may help generalise findings, this diversity may also lead to residual heterogeneity, as seen in the case of adverse events considered non-serious.

Whilst our study provides valuable insights into the efficacy and safety outcomes of homologous compared with heterologous vaccine regimens across various vaccine platforms, we acknowledge that the absence of trials involving recombinant protein boosters may have limited our exploration of the effect of protein-based heterologous boosters. Additionally, the majority of the trials had a follow-up time of less than 3 months, along with large inoculation time intervals between doses, potentially resulting in failure to adequately gauge benefits and harms. The absence of disaggregated data for older adults, who along with the immunocompromised population, are poised to benefit the most from a booster dose, further limits our analyses.

Hence, this systematic review underscores the imperative for more robust randomised clinical trials to corroborate either all non-significant differences observed or explore the possibility of a differential effect between heterologous versus homologous booster regimen, also among older adults.

Our living systematic review provides current insights into the comparative efficacy and safety of heterologous versus homologous COVID-19 booster regimens. Upon evaluating three vaccine efficacy outcomes, i.e., all-cause mortality, symptomatic COVID-19, and severe COVID-19, no adequate accrued sample size was reached to be able to conclude a lack of difference in prevention between the heterologous versus homologous booster vaccine regimens. In terms of safety outcomes, whilst heterologous vaccine regimens may lead to higher occurrences of AE considered non-serious in contrast to SAE which showed a pooled relative risk range that encompassed the line of no effect, our TSAs pointed to inadequate sample size for both outcomes. As multivalent vaccine heterologous boosters become more prominent, future randomised clinical trials should prioritise diverse populations, including older adults and immunocompromised people and ensure standardised assessment to optimise vaccination strategies and global pandemic control efforts.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Adverse events

Confidence intervals

Coronavirus disease 2019

Grading of Recommendations, Assessment, Development, and Evaluation

Messenger RNA

Randomised clinical trials

Relative risk or risk ratio

Reverse transcription polymerase chain reaction

Severe respiratory syndrome coronavirus 2

Trial sequential analysis

Variants of interest

Variants under monitoring

World Health Organization

WHO. COVID-19 Weekly Epidemiological Update - Edition 158. Available from: https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---1-september-2023 .

WHO. Global COVID-19 vaccination strategy in a changing world. July 2022 Update. Available from: https://www.who.int/publications/m/item/global-covid-19-vaccination-strategy-in-a-changing-world--july-2022-update .

Atmar RL, Lyke KE, Deming ME, Jackson LA, Branche AR, El Sahly HM, et al. Homologous and heterologous COVID-19 booster vaccinations. N Engl J Med. 2022;386(11):1046–57.

Article CAS PubMed Google Scholar

WHO. SAGE updates COVID-19 vaccination guidance. Available from: https://www.who.int/news/item/28-03-2023-sage-updates-covid-19-vaccination-guidance .

Cheng H, Peng Z, Si S, Alifu X, Zhou H, Chi P, et al. Immunogenicity and safety of homologous and heterologous prime–boost immunization with COVID-19 vaccine: systematic review and meta-analysis. Vaccines. 2022;10(5):798.

Article CAS PubMed PubMed Central Google Scholar

Deng J, Ma Y, Liu Q, Du M, Liu M, Liu J. Comparison of the effectiveness and safety of heterologous booster doses with homologous booster doses for SARS-CoV-2 vaccines: a systematic review and meta-analysis. Int J Environ Res Public Health. 2022;19(17):10752.

Au WY, Cheung PPH. Effectiveness of heterologous and homologous covid-19 vaccine regimens: living systematic review with network meta-analysis. BMJ. 2022;377:e069989.

Article PubMed Google Scholar

Korang SK, Juul S, Nielsen EE, Feinberg J, Siddiqui F, Ong G, et al. Vaccines to prevent COVID-19: a protocol for a living systematic review with network meta-analysis including individual patient data (The LIVING VACCINE Project). Syst Rev. 2020;9(1):262.

Article PubMed PubMed Central Google Scholar

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane, 2022. Available from www.training.cochrane.org/handbook .

Wohlin C, Kalinowski M, Romero Felizardo K, Mendes E. Successful combination of database search and snowballing for identification of primary studies in systematic literature studies. Inf Softw Technol. 2022;147:106908.

Article Google Scholar

Trial Sequential Analysis (TSA) [Computer program]. The copenhagen trial unit, centre for clinical intervention research, the capital region, Copenhagen University Hospital – Rigshospitalet. 2021. 2021.

Google Scholar

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–58.

Demets DL. Methods for combining randomized clinical trials: strengths and limitations. Stat Med. 1987;6(3):341–8.

Jakobsen JC, Wetterslev J, Winkel P, Lange T, Gluud C. Thresholds for statistical and clinical significance in systematic reviews with meta-analytic methods. BMC Med Res Methodol. 2014;14(1):120.

Higgins JPT. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60.

Hebel C, Thomsen AR. A survey of mechanisms underlying current and potential COVID-19 vaccines. APMIS. 2023;131(2):37–60.

Verdecia M, Kokai-Kun JF, Kibbey M, Acharya S, Venema J, Atouf F. COVID-19 vaccine platforms: delivering on a promise? Hum Vaccines Immunother. 2021;17(9):2873–93.

Article CAS Google Scholar

Lee ARYB, Wong SY, Chai LYA, Lee SC, Lee MX, Muthiah MD, et al. Efficacy of COVID-19 vaccines in immunocompromised patients: systematic review and meta-analysis. BMJ. 2022:e068632. https://doi.org/10.1136/bmj-2021-068632 .

Hall VG, Ferreira VH, Wood H, Ierullo M, Majchrzak-Kita B, Manguiat K, et al. Delayed-interval BNT162b2 mRNA COVID-19 vaccination enhances humoral immunity and induces robust T cell responses. Nat Immunol. 2022;23(3):380–5.

Català M, Li X, Prats C, Prieto-Alhambra D. The impact of prioritisation and dosing intervals on the effects of COVID-19 vaccination in Europe: an agent-based cohort model. Sci Rep. 2021;11(1):18812.

Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis. J Clin Epidemiol. 2008;61(1):64–75.

Wetterslev J, Thorlund K, Brok J, Gluud C. Estimating required information size by quantifying diversity in random-effects model meta-analyses. BMC Med Res Methodol. 2009;9(1):86.

Wetterslev J, Jakobsen JC, Gluud C. Trial sequential analysis in systematic reviews with meta-analysis. BMC Med Res Methodol. 2017;17(1):39.

Brok J, Thorlund K, Gluud C, Wetterslev J. Trial sequential analysis reveals insufficient information size and potentially false positive results in many meta-analyses. J Clin Epidemiol. 2008;61(8):763–9.

Castellini G, Bruschettini M, Gianola S, Gluud C, Moja L. Assessing imprecision in Cochrane systematic reviews: a comparison of GRADE and trial sequential analysis. Syst Rev. 2018;7(1):110.

Zhang Y, Ma X, Yan G, Wu Y, Chen Y, Zhou Z, et al. Immunogenicity, durability, and safety of an mRNA and three platform-based COVID-19 vaccines as a third dose following two doses of CoronaVac in China: a randomised, double-blinded, placebo-controlled, phase 2 trial. eClinicalMedicine. 2022;54:101680.

Li J, Hou L, Guo X, Jin P, Wu S, Zhu J, et al. Heterologous AD5-nCOV plus CoronaVac versus homologous CoronaVac vaccination: a randomized phase 4 trial. Nat Med. 2022;28(2):401–9.

Omma A, Batirel A, Aydin M, Yilmaz Karadag F, Erden A, Kucuksahin O, et al. Safety and immunogenicity of inactive vaccines as booster doses for COVID-19 in Türkiye: a randomized trial. Hum Vaccines Immunother. 2022;18(6):2122503.

Yong X, Liu J, Zeng Y, Nie J, Cui X, Wang T, et al. Safety and immunogenicity of a heterologous booster with an RBD virus-like particle vaccine following two- or three-dose inactivated COVID-19 vaccine. Hum Vaccines Immunother. 2023;19(3):2267869.

Bonelli M, Mrak D, Tobudic S, Sieghart D, Koblischke M, Mandl P, et al. Additional heterologous versus homologous booster vaccination in immunosuppressed patients without SARS-CoV-2 antibody seroconversion after primary mRNA vaccination: a randomised controlled trial. Ann Rheum Dis. 2022;81(5):687–94.

Mrak D, Sieghart D, Simader E, Tobudic S, Radner H, Mandl P, et al. Heterologous vector versus homologous mRNA COVID-19 booster vaccination in non-seroconverted immunosuppressed patients: a randomized controlled trial. Nat Commun. 2022;13(1):5362.

Natori Y, Martin E, Mattiazzi A, Arosemena L, Ortigosa-Goggins M, Shobana S, et al. A pilot single-blinded, randomized, controlled trial comparing BNT162b2 vs. JNJ-78436735 vaccine as the third dose after two doses of BNT162b2 vaccine in solid organ transplant recipients. Transpl Int. 2023;36:10938.

Reindl-Schwaighofer R, Heinzel A, Mayrdorfer M, Jabbour R, Hofbauer TM, Merrelaar A, et al. Comparison of SARS-CoV-2 antibody response 4 weeks after homologous vs heterologous third vaccine dose in kidney transplant recipients: a randomized clinical trial. JAMA Intern Med. 2022;182(2):165.

Sharifi Aliabadi L, Karami M, Barkhordar M, Hashemi Nazari SS, Kavousi A, Ahmadvand M, et al. Homologous versus heterologous prime-boost COVID-19 vaccination in autologous hematopoietic stem cell transplantation recipients: a blinded randomized controlled trial. Front Immunol. 2023;14:1237916.

Jin PF, Guo XL, Gou JB, Hou LH, Song ZZ, Zhu T, et al. Immunogenicity and safety of heterologous immunisation with Ad5-nCOV in healthy adults aged 60 years and older primed with an inactivated SARS-CoV-2 vaccine (CoronaVac): a phase 4, randomised, observer-blind, non-inferiority trial. Lancet Reg Health - West Pac. 2023;38: 100829.

PubMed PubMed Central Google Scholar

Corominas J, Garriga C, Prenafeta A, Moros A, Cañete M, Barreiro A, et al. Safety and immunogenicity of the protein-based PHH-1V compared to BNT162b2 as a heterologous SARS-CoV-2 booster vaccine in adults vaccinated against COVID-19: a multicentre, randomised, double-blind, non-inferiority phase IIb trial. Lancet Reg Health - Eur. 2023;28:100613.

Munro APS, Janani L, Cornelius V, Aley PK, Babbage G, Baxter D, et al. Safety and immunogenicity of seven COVID-19 vaccines as a third dose (booster) following two doses of ChAdOx1 nCov-19 or BNT162b2 in the UK (COV-BOOST): a blinded, multicentre, randomised, controlled, phase 2 trial. Lancet. 2021;398(10318):2258–76.

Kaabi NA, Yang YK, Du LF, Xu K, Shao S, Liang Y, et al. Safety and immunogenicity of a hybrid-type vaccine booster in BBIBP-CorV recipients in a randomized phase 2 trial. Nat Commun. 2022;13(1):3654.

Rose W, Raju R, Babji S, George A, Madhavan R, Leander Xavier JV, et al. Immunogenicity and safety of homologous and heterologous booster vaccination of ChAdOx1 nCoV-19 (COVISHIELDTM) and BBV152 (COVAXIN®): a non-inferiority phase 4, participant and observer-blinded, randomised study. Lancet Reg Health - Southeast Asia. 2023;100141.

Shinkai M, Sonoyama T, Kamitani A, Shibata RY, Seki NM, Omoto S, et al. Immunogenicity and safety of booster dose of S-268019-b or BNT162b2 in Japanese participants: an interim report of phase 2/3, randomized, observer-blinded, noninferiority study. Vaccine. 2022;40(32):4328–33.

Launay O, Cachanado M, Luong Nguyen LB, Ninove L, Lachâtre M, Ben Ghezala I, et al. Immunogenicity and safety of beta-adjuvanted recombinant booster vaccine. N Engl J Med. 2022;387(4):374–6.

Fadlyana E, Setiabudi D, Kartasasmita CB, Putri ND, Rezeki Hadinegoro S, Mulholland K, et al. Immunogenicity and safety in healthy adults of full dose versus half doses of COVID-19 vaccine (ChAdOx1-S or BNT162b2) or full-dose CoronaVac administered as a booster dose after priming with CoronaVac: a randomised, observer-masked, controlled trial in Indonesia. Lancet Infect Dis. 2023;23(5):545–55.

Leung NHL, Cheng SMS, Cohen CA, Martín-Sánchez M, Au NYM, Luk LLH, et al. Comparative antibody and cell-mediated immune responses, reactogenicity, and efficacy of homologous and heterologous boosting with CoronaVac and BNT162b2 (Cobovax): an open-label, randomised trial. Lancet Microbe. 2023;4(9):e670–82.

Costa Clemens SA, Weckx L, Clemens R, Almeida Mendes AV, Ramos Souza A, Silveira MBV, et al. Heterologous versus homologous COVID-19 booster vaccination in previous recipients of two doses of CoronaVac COVID-19 vaccine in Brazil (RHH-001): a phase 4, non-inferiority, single blind, randomised study. Lancet. 2022;399(10324):521–9.

Roa CC, De Los Reyes MRA, Plennevaux E, Smolenov I, Hu B, Gao F, et al. Superior Boosting of Neutralizing Titers Against Omicron SARS-CoV-2 Variants by heterologous SCB-2019 vaccine vs a homologous booster in CoronaVac-primed adults. J Infect Dis. 2023;228(9):1253–62.

Ahi M, Hamidi Farahani R, Basiri P, Karimi Rahjerdi A, Sheidaei A, Gohari K, et al. Comparison of the safety and immunogenicity of FAKHRAVAC and BBIBP-CorV vaccines when administrated as booster dose: a parallel two arms, randomized, double blind clinical trial. Vaccines. 2022;10(11):1800.

Poh XY, Tan CW, Lee IR, Chavatte JM, Fong SW, Prince T, et al. Antibody response of heterologous vs homologous messenger RNA vaccine boosters against the severe acute respiratory syndrome coronavirus 2 omicron variant: interim results from the PRIBIVAC study, a randomized clinical trial. Clin Infect Dis. 2022;75(12):2088–96.

Kulkarni PS, Gunale B, Kohli S, Lalwani S, Tripathy S, Kar S, et al. A phase 3, randomized, non-inferiority study of a heterologous booster dose of SARS CoV-2 recombinant spike protein vaccine in adults. Sci Rep. 2023;13(1):16579.

Akahata W, Sekida T, Nogimori T, Ode H, Tamura T, Kono K, et al. Safety and immunogenicity of SARS-CoV-2 self-amplifying RNA vaccine expressing an anchored RBD: a randomized, observer-blind phase 1 study. Cell Rep Med. 2023;4(8): 101134.

Hannawi S, Yan L, Saf Eldin L, Abuquta A, Alamadi A, Mahmoud SA, et al. Safety and immunogenicity of multivalent SARS-CoV-2 protein vaccines: a randomized phase 3 trial. eClinicalMedicine. 2023;64:102195.

Hannawi S, Saf Eldin L, Abuquta A, Alamadi A, Mahmoud SA, Hassan A, et al. Safety and immunogenicity of a tetravalent and bivalent SARS-CoV-2 protein booster vaccine in men. Nat Commun. 2023;14(1):4043.

Toback S, Marchese AM, Warren B, Ayman S, Zarkovic S, ElTantawy I, et al. Safety and immunogenicity of the NVX-CoV2373 vaccine as a booster in adults previously vaccinated with the BBIBP-CorV vaccine: an interim analysis. Infectious Diseases (except HIV/AIDS). 2023. Available from: http://medrxiv.org/lookup/doi/10.1101/2023.03.24.23287658 .

Tang R, Zheng H, Wang BS, Gou JB, Guo XL, Chen XQ, et al. Safety and immunogenicity of aerosolised Ad5-nCoV, intramuscular Ad5-nCoV, or inactivated COVID-19 vaccine CoronaVac given as the second booster following three doses of CoronaVac: a multicentre, open-label, phase 4, randomised trial. Lancet Respir Med. 2023;S2213260023000498. https://doi.org/10.1136/bmj-2021-068632 .

Kaabi NA, Yang YK, Liang Y, Xu K, Zhang XF, Kang Y, et al. Safety and immunogenicity of a mosaic vaccine booster against Omicron and other SARS-CoV-2 variants: a randomized phase 2 trial. Signal Transduct Target Ther. 2023;8(1):20.

Download references

Acknowledgements

We would like to thank Sarah Klingenberg for her invaluable assistance as an information specialist at the Copenhagen Trial Unit, The Cochrane Hepato-Biliary Group, in developing and conducting the searches.

The Copenhagen Trial Unit provided support in the form of salaries for those affiliated with the centre.

Author information

Mark Aninakwah Asante and Martin Ekholm Michelsen shared first authors.

Authors and Affiliations

Copenhagen Trial Unit, Centre for Clinical Intervention Research, The Capital Region, Copenhagen University Hospital – Rigshospitalet, Copenhagen, Denmark

Mark Aninakwah Asante, Martin Ekholm Michelsen, Mithuna Mille Balakumar, Buddheera Kumburegama, Steven Kwasi Korang, Christian Gluud & Sonia Menon

Department of Pharmaceutical Care, Sina Hospital, Tehran University of Medical Sciences, Tehran, Iran

Amin Sharifan

Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

Allan Randrup Thomsen

Department of Pediatrics, Children’s Hospital Los Angeles, Los Angeles, CA, USA

Steven Kwasi Korang

Department of Regional Health Research, The Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark

Christian Gluud

Epitech Research, Brussels, Belgium

Sonia Menon

You can also search for this author in PubMed Google Scholar

Contributions

SM and MA conceived the specific research question and coordinated the systematic review. SKK designed the search strategy. MAA, MEM, MMB, BK, AS, and SM screened the abstracts and full texts. MAA, MEM, MMB, BK, and AS extracted the data. MAA, MEM, MMB, BK, and SM assessed the risk of bias. MAA, MEM, MMB, BK, AS, and SM were involved in the quality control of the extracted data. MAA and SM performed the data analysis. MAA, MEM, AS, and SM contributed to the first draft of the manuscript. All authors were involved in the interpretation of results and critical revision of manuscript. SM is the guarantor and attests that all authors mentioned meet authorship criteria. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sonia Menon .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

SM received consulting fees as a consultant for the P-95 consulting firm. ART reports leadership or fiduciary role in other board, society, committee or advocacy group, unpaid with the Danish Society of Immunology as chairman. AS reports leadership or fiduciary role in other board, society, committee or advocacy group, unpaid with Cochrane as a steering member of the Cochrane Early Career Professionals Network. All other authors declared no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: tables s1–s2 and figures s1–43., rights and permissions.

Reprints and permissions

About this article

Cite this article.

Asante, M.A., Michelsen, M.E., Balakumar, M.M. et al. Heterologous versus homologous COVID-19 booster vaccinations for adults: systematic review with meta-analysis and trial sequential analysis of randomised clinical trials. BMC Med 22 , 263 (2024). https://doi.org/10.1186/s12916-024-03471-3

Download citation

Received : 08 October 2023

Accepted : 06 June 2024

Published : 24 June 2024

DOI : https://doi.org/10.1186/s12916-024-03471-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

COVID-19 vaccines
Booster immunisation
Heterologous immunity
Homologous immunity
Vaccine efficacy
Vaccine safety

BMC Medicine

ISSN: 1741-7015

Submission enquiries: [email protected]
General enquiries: [email protected]

IMAGES

How to Conduct a Systematic Review
Systematic Literature Review Methodology
systematic literature review steps
Simple Systematic Review Using a 5-step
Step-by-step description of the systematic review process. Adapted from
(PDF) How to Write a Systematic Review

VIDEO

Introduction to Systematic Literature Review || Topic 10|| Perspectives by Ummara
Systematic Literature Review: An Introduction [Urdu/Hindi]
Example of Systematic Review Poster
Introduction to systematic review and meta analysis: an example
Systematic Literature Review
SYSTEMATIC LITERATURE REVIEW- Methodology to Proceed

COMMENTS

Systematic Review
A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr. Robert Boyle and his colleagues published a systematic review in ...
How to write the methods section of a systematic review
Keep it brief. The methods section should be succinct but include all the noteworthy information. This can be a difficult balance to achieve. A useful strategy is to aim for a brief description that signposts the reader to a separate section or sections of supporting information. This could include datasets, a flowchart to show what happened to ...
Systematic Review
A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr Robert Boyle and his colleagues published a systematic review in ...
How to Write a Systematic Review: A Narrative Review
Background. A systematic review, as its name suggests, is a systematic way of collecting, evaluating, integrating, and presenting findings from several studies on a specific question or topic.[] A systematic review is a research that, by identifying and combining evidence, is tailored to and answers the research question, based on an assessment of all relevant studies.[2,3] To identify assess ...
An overview of methodological approaches in systematic reviews
1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...
A step by step guide for conducting a systematic review and meta
A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Furthermore, ... An example of a research question for SR/MA based on PICO for this issue is as follows: How is the safety and immunogenicity of Ebola vaccine in human? ...
Guidelines for writing a systematic review
Example; Systematic review: The most robust review method, usually with the involvement of more than one author, intends to systematically search for and appraise literature with pre-existing inclusion criteria. (Salem et al., 2023) Rapid review: Utilises Systematic Review methods but may be time limited. (Randles and Finnegan, 2022) Meta-analysis
How to do a systematic review
Systematic reviews can address any deﬁned research question. Table 1 provides examples of questions that have been addressed in published reviews relating to stroke, and examples of resources relating to diﬀerent types of reviews. The table illustrates that there are dif-ferent types and methods of systematic review for dif-ferent types of ...
How to Do a Systematic Review: A Best Practice Guide ...
Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to ...
How to Do a Systematic Review: A Best Practice Guide for ...
Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question.
(PDF) How to Do a Systematic Review: A Best Practice Guide for
Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a ...
Systematic Reviews: Step 8: Write the Review
The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) is a 27-item checklist used to improve transparency in systematic reviews. These items cover all aspects of the manuscript, including title, abstract, introduction, methods, results, discussion, and funding. The PRISMA checklist can be downloaded in PDF or Word files.
PDF Conducting a Systematic Review: Methodology and Steps
TABLE OF CONTENTS. CO. TEMATIC REVIEW:METHODOLOGY AND STEPS1.INTRODUCTIONSystematic reviews have gained momentum as a key method of evidence syn. hesis in global development research in recent times. As defined in the Cochrane Handbook on Systematic reviews "Systematic reviews seek to collate evidence that fits pre-specified eligibility cri.
Steps of a Systematic Review
Image: https://pixabay.com Steps to conducting a systematic review: PIECES. P: Planning - the methods of the systematic review are generally decided before conducting it. I: Identifying - searching for studies which match the preset criteria in a systematic manner E: Evaluating - sort all retrieved articles (included or excluded) and assess the risk of bias for each included study
Easy guide to conducting a systematic review
A systematic review is a type of study that synthesises research that has been conducted on a particular topic. Systematic reviews are considered to provide the highest level of evidence on the hierarchy of evidence pyramid. Systematic reviews are conducted following rigorous research methodology. To minimise bias, systematic reviews utilise a ...
Home
A systematic review is a literature review that gathers all of the available evidence matching pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods, documented in a protocol, to minimize bias, provide reliable findings, and inform decision-making.
Methodology of a systematic review
A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. To improve scientific writing, the methodology is shown in a structured manner to implement a systematic review.
Systematic review
A systematic review is a scholarly synthesis of the evidence on a clearly presented topic using critical methods to identify, define and assess research on the topic. A systematic review extracts and interprets data from published studies on the topic (in the scientific literature), then analyzes, describes, critically appraises and summarizes interpretations into a refined evidence-based ...
Types of Reviews
This site explores different review methodologies such as, systematic, scoping, realist, narrative, state of the art, meta-ethnography, critical, and integrative reviews. The LITR-EX site has a health professions education focus, but the advice and information is widely applicable. Types of Reviews. Review the table to peruse review types and ...
Introduction to systematic review and meta-analysis
It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...
A Research Guide for Systematic Literature Reviews
Rapid Review: Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. Completeness of searching determined by time constraints. Time-limited formal quality assessment. Typically narrative and tabular.
A step by step guide for conducting a systematic review and meta
A systematic review, on the other hand, is defined as a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. ... We proposed our methods according to a valid explanatory simulation example choosing the topic of "evaluating safety of Ebola vaccine," as it is known that Ebola is ...
Examples of systematic reviews
Please choose the tab below for your discipline to see relevant examples. For more information about how to conduct and write reviews, please see the Guidelines section of this guide. Vibration and bubbles: a systematic review of the effects of helicopter retrieval on injured divers. (2018). Nicotine effects on exercise performance and ...
Types of Review Articles
Mixed studies review/mixed methods review: Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies. Overview
A systematic review of experimentally tested implementation strategies
Methods. We conducted a systematic review of studies examining implementation strategies from 2010-2022 and registered with PROSPERO (CRD42021235592). ... For example, the search terms may not have captured tests of policies, financial strategies, community health promotion initiatives, or electronic medical record reminders, due to differences ...
Assessing the impact of evidence-based mental health guidance during
A systematic review (protocol registered on Open Science Framework) identified summaries or syntheses of guidelines for mental health care during and after the COVID-19 pandemic and assessed the accuracy of the methods used in the OxPPL guidance by identifying any resources that the guidance had not included.
Guidance to best tools and practices for systematic reviews
However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [126, 140, 141].
Diagnostics
Background: This systematic review examines and evaluates the relationship between salivary cortisol levels and temporomandibular disorder (TMD) in young adult patients. Method: Six databases—PubMed, Scopus, Web of Science, Google Scholar, ProQuest, and Cochrane Library—were utilized to screen eligible studies. A systematic search was performed based on PECO questions and eligibility criteria.
Heterologous versus homologous COVID-19 booster vaccinations for adults
Background To combat coronavirus disease 2019 (COVID-19), booster vaccination strategies are important. However, the optimal administration of booster vaccine platforms remains unclear. Herein, we aimed to assess the benefits and harms of three or four heterologous versus homologous booster regimens. Methods From November 3 2022 to December 21, 2023, we searched five databases for randomised ...

Have a language expert improve your writing

Systematic Review | Definition, Example & Guide

Table of contents

Here's why students love Scribbr's proofreading services

Receive feedback on language, structure, and formatting

Step 1: Formulate a research question

Step 2: Develop a protocol

Step 3: Search for all relevant studies

Step 4: Apply the selection criteria

Step 5: Extract the data

Step 6: Synthesize the data

Step 7: Write and publish a report

Cite this Scribbr article

Is this article helpful?

Shaun Turney

How to write the methods section of a systematic review

Covidence breaks down how to write a methods section

Describe what happened

Keep it brief

Follow a structure

1. Selection criteria ⭕

2. Search 🕵🏾‍♀️

3. Data collection and analysis 👩‍💻

4. Study quality and risk of bias ⚖️

Laura Mellor. Portsmouth, UK

Top 5 Tips for High-Quality Systematic Review Data Extraction

How to get through study quality assessment Systematic Review

How to extract study data for your systematic review

Better systematic review management

Have a language expert improve your writing

Systematic Review | Definition, Examples & Guide

Table of contents

Prevent plagiarism, run a free check.

Step 1: Formulate a research question

Step 2: Develop a protocol

Step 3: Search for all relevant studies

Step 4: Apply the selection criteria

Step 5: Extract the data

Step 6: Synthesise the data

Step 7: Write and publish a report

Cite this Scribbr article

Is this article helpful?

Shaun Turney

Annual Review of Psychology

Most Read This Month

Systematic Review

Steps of a Systematic Review

Forms and templates

Systematic Reviews: Home

What is a Systematic Review?

There are many types of literature reviews.

The average systematic review takes 1,168 hours to complete. ¹ A librarian can help you speed up the process.

Save citation to file

Add to My Bibliography

Methodology of a systematic review

Similar articles

LinkOut - more resources

Other Literature Sources

Research Materials

Conducting Research

Systematic Reviews

Using Databases

Finding & Accessing

Writing & Citing

Meet a Librarian

Search Services

Citation Mgmt

Scholarly Communications

Library Updates

Types of Reviews

Review Typologies

A step by step guide for conducting a systematic review and meta-analysis with simulation data

Introduction

Methods and results

Research question and objectives

Preliminary research and idea validation

Inclusion and exclusion criteria

Search strategy

Search databases, import all results to a library, and exporting to an excel sheet

Protocol writing and registration