U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Grad Med Educ
  • v.8(3); 2016 Jul

The Literature Review: A Foundation for High-Quality Medical Education Research

a  These are subscription resources. Researchers should check with their librarian to determine their access rights.

Despite a surge in published scholarship in medical education 1 and rapid growth in journals that publish educational research, manuscript acceptance rates continue to fall. 2 Failure to conduct a thorough, accurate, and up-to-date literature review identifying an important problem and placing the study in context is consistently identified as one of the top reasons for rejection. 3 , 4 The purpose of this editorial is to provide a road map and practical recommendations for planning a literature review. By understanding the goals of a literature review and following a few basic processes, authors can enhance both the quality of their educational research and the likelihood of publication in the Journal of Graduate Medical Education ( JGME ) and in other journals.

The Literature Review Defined

In medical education, no organization has articulated a formal definition of a literature review for a research paper; thus, a literature review can take a number of forms. Depending on the type of article, target journal, and specific topic, these forms will vary in methodology, rigor, and depth. Several organizations have published guidelines for conducting an intensive literature search intended for formal systematic reviews, both broadly (eg, PRISMA) 5 and within medical education, 6 and there are excellent commentaries to guide authors of systematic reviews. 7 , 8

  • A literature review forms the basis for high-quality medical education research and helps maximize relevance, originality, generalizability, and impact.
  • A literature review provides context, informs methodology, maximizes innovation, avoids duplicative research, and ensures that professional standards are met.
  • Literature reviews take time, are iterative, and should continue throughout the research process.
  • Researchers should maximize the use of human resources (librarians, colleagues), search tools (databases/search engines), and existing literature (related articles).
  • Keeping organized is critical.

Such work is outside the scope of this article, which focuses on literature reviews to inform reports of original medical education research. We define such a literature review as a synthetic review and summary of what is known and unknown regarding the topic of a scholarly body of work, including the current work's place within the existing knowledge . While this type of literature review may not require the intensive search processes mandated by systematic reviews, it merits a thoughtful and rigorous approach.

Purpose and Importance of the Literature Review

An understanding of the current literature is critical for all phases of a research study. Lingard 9 recently invoked the “journal-as-conversation” metaphor as a way of understanding how one's research fits into the larger medical education conversation. As she described it: “Imagine yourself joining a conversation at a social event. After you hang about eavesdropping to get the drift of what's being said (the conversational equivalent of the literature review), you join the conversation with a contribution that signals your shared interest in the topic, your knowledge of what's already been said, and your intention.” 9

The literature review helps any researcher “join the conversation” by providing context, informing methodology, identifying innovation, minimizing duplicative research, and ensuring that professional standards are met. Understanding the current literature also promotes scholarship, as proposed by Boyer, 10 by contributing to 5 of the 6 standards by which scholarly work should be evaluated. 11 Specifically, the review helps the researcher (1) articulate clear goals, (2) show evidence of adequate preparation, (3) select appropriate methods, (4) communicate relevant results, and (5) engage in reflective critique.

Failure to conduct a high-quality literature review is associated with several problems identified in the medical education literature, including studies that are repetitive, not grounded in theory, methodologically weak, and fail to expand knowledge beyond a single setting. 12 Indeed, medical education scholars complain that many studies repeat work already published and contribute little new knowledge—a likely cause of which is failure to conduct a proper literature review. 3 , 4

Likewise, studies that lack theoretical grounding or a conceptual framework make study design and interpretation difficult. 13 When theory is used in medical education studies, it is often invoked at a superficial level. As Norman 14 noted, when theory is used appropriately, it helps articulate variables that might be linked together and why, and it allows the researcher to make hypotheses and define a study's context and scope. Ultimately, a proper literature review is a first critical step toward identifying relevant conceptual frameworks.

Another problem is that many medical education studies are methodologically weak. 12 Good research requires trained investigators who can articulate relevant research questions, operationally define variables of interest, and choose the best method for specific research questions. Conducting a proper literature review helps both novice and experienced researchers select rigorous research methodologies.

Finally, many studies in medical education are “one-offs,” that is, single studies undertaken because the opportunity presented itself locally. Such studies frequently are not oriented toward progressive knowledge building and generalization to other settings. A firm grasp of the literature can encourage a programmatic approach to research.

Approaching the Literature Review

Considering these issues, journals have a responsibility to demand from authors a thoughtful synthesis of their study's position within the field, and it is the authors' responsibility to provide such a synthesis, based on a literature review. The aforementioned purposes of the literature review mandate that the review occurs throughout all phases of a study, from conception and design, to implementation and analysis, to manuscript preparation and submission.

Planning the literature review requires understanding of journal requirements, which vary greatly by journal ( table 1 ). Authors are advised to take note of common problems with reporting results of the literature review. Table 2 lists the most common problems that we have encountered as authors, reviewers, and editors.

Sample of Journals' Author Instructions for Literature Reviews Conducted as Part of Original Research Article a

An external file that holds a picture, illustration, etc.
Object name is i1949-8357-8-3-297-t01.jpg

Common Problem Areas for Reporting Literature Reviews in the Context of Scholarly Articles

An external file that holds a picture, illustration, etc.
Object name is i1949-8357-8-3-297-t02.jpg

Locating and Organizing the Literature

Three resources may facilitate identifying relevant literature: human resources, search tools, and related literature. As the process requires time, it is important to begin searching for literature early in the process (ie, the study design phase). Identifying and understanding relevant studies will increase the likelihood of designing a relevant, adaptable, generalizable, and novel study that is based on educational or learning theory and can maximize impact.

Human Resources

A medical librarian can help translate research interests into an effective search strategy, familiarize researchers with available information resources, provide information on organizing information, and introduce strategies for keeping current with emerging research. Often, librarians are also aware of research across their institutions and may be able to connect researchers with similar interests. Reaching out to colleagues for suggestions may help researchers quickly locate resources that would not otherwise be on their radar.

During this process, researchers will likely identify other researchers writing on aspects of their topic. Researchers should consider searching for the publications of these relevant researchers (see table 3 for search strategies). Additionally, institutional websites may include curriculum vitae of such relevant faculty with access to their entire publication record, including difficult to locate publications, such as book chapters, dissertations, and technical reports.

Strategies for Finding Related Researcher Publications in Databases and Search Engines

An external file that holds a picture, illustration, etc.
Object name is i1949-8357-8-3-297-t03.jpg

Search Tools and Related Literature

Researchers will locate the majority of needed information using databases and search engines. Excellent resources are available to guide researchers in the mechanics of literature searches. 15 , 16

Because medical education research draws on a variety of disciplines, researchers should include search tools with coverage beyond medicine (eg, psychology, nursing, education, and anthropology) and that cover several publication types, such as reports, standards, conference abstracts, and book chapters (see the box for several information resources). Many search tools include options for viewing citations of selected articles. Examining cited references provides additional articles for review and a sense of the influence of the selected article on its field.

Box Information Resources

  • Web of Science a
  • Education Resource Information Center (ERIC)
  • Cumulative Index of Nursing & Allied Health (CINAHL) a
  • Google Scholar

Once relevant articles are located, it is useful to mine those articles for additional citations. One strategy is to examine references of key articles, especially review articles, for relevant citations.

Getting Organized

As the aforementioned resources will likely provide a tremendous amount of information, organization is crucial. Researchers should determine which details are most important to their study (eg, participants, setting, methods, and outcomes) and generate a strategy for keeping those details organized and accessible. Increasingly, researchers utilize digital tools, such as Evernote, to capture such information, which enables accessibility across digital workspaces and search capabilities. Use of citation managers can also be helpful as they store citations and, in some cases, can generate bibliographies ( table 4 ).

Citation Managers

An external file that holds a picture, illustration, etc.
Object name is i1949-8357-8-3-297-t04.jpg

Knowing When to Say When

Researchers often ask how to know when they have located enough citations. Unfortunately, there is no magic or ideal number of citations to collect. One strategy for checking coverage of the literature is to inspect references of relevant articles. As researchers review references they will start noticing a repetition of the same articles with few new articles appearing. This can indicate that the researcher has covered the literature base on a particular topic.

Putting It All Together

In preparing to write a research paper, it is important to consider which citations to include and how they will inform the introduction and discussion sections. The “Instructions to Authors” for the targeted journal will often provide guidance on structuring the literature review (or introduction) and the number of total citations permitted for each article category. Reviewing articles of similar type published in the targeted journal can also provide guidance regarding structure and average lengths of the introduction and discussion sections.

When selecting references for the introduction consider those that illustrate core background theoretical and methodological concepts, as well as recent relevant studies. The introduction should be brief and present references not as a laundry list or narrative of available literature, but rather as a synthesized summary to provide context for the current study and to identify the gap in the literature that the study intends to fill. For the discussion, citations should be thoughtfully selected to compare and contrast the present study's findings with the current literature and to indicate how the present study moves the field forward.

To facilitate writing a literature review, journals are increasingly providing helpful features to guide authors. For example, the resources available through JGME include several articles on writing. 17 The journal Perspectives on Medical Education recently launched “The Writer's Craft,” which is intended to help medical educators improve their writing. Additionally, many institutions have writing centers that provide web-based materials on writing a literature review, and some even have writing coaches.

The literature review is a vital part of medical education research and should occur throughout the research process to help researchers design a strong study and effectively communicate study results and importance. To achieve these goals, researchers are advised to plan and execute the literature review carefully. The guidance in this editorial provides considerations and recommendations that may improve the quality of literature reviews.

University of North Florida

  • Become Involved |
  • Give to the Library |
  • Staff Directory |
  • UNF Library
  • Thomas G. Carpenter Library

Conducting a Literature Review

Benefits of conducting a literature review.

  • Steps in Conducting a Literature Review
  • Summary of the Process
  • Additional Resources
  • Literature Review Tutorial by American University Library
  • The Literature Review: A Few Tips On Conducting It by University of Toronto
  • Write a Literature Review by UC Santa Cruz University Library

While there might be many reasons for conducting a literature review, following are four key outcomes of doing the review.

Assessment of the current state of research on a topic . This is probably the most obvious value of the literature review. Once a researcher has determined an area to work with for a research project, a search of relevant information sources will help determine what is already known about the topic and how extensively the topic has already been researched.

Identification of the experts on a particular topic . One of the additional benefits derived from doing the literature review is that it will quickly reveal which researchers have written the most on a particular topic and are, therefore, probably the experts on the topic. Someone who has written twenty articles on a topic or on related topics is more than likely more knowledgeable than someone who has written a single article. This same writer will likely turn up as a reference in most of the other articles written on the same topic. From the number of articles written by the author and the number of times the writer has been cited by other authors, a researcher will be able to assume that the particular author is an expert in the area and, thus, a key resource for consultation in the current research to be undertaken.

Identification of key questions about a topic that need further research . In many cases a researcher may discover new angles that need further exploration by reviewing what has already been written on a topic. For example, research may suggest that listening to music while studying might lead to better retention of ideas, but the research might not have assessed whether a particular style of music is more beneficial than another. A researcher who is interested in pursuing this topic would then do well to follow up existing studies with a new study, based on previous research, that tries to identify which styles of music are most beneficial to retention.

Determination of methodologies used in past studies of the same or similar topics.  It is often useful to review the types of studies that previous researchers have launched as a means of determining what approaches might be of most benefit in further developing a topic. By the same token, a review of previously conducted studies might lend itself to researchers determining a new angle for approaching research.

Upon completion of the literature review, a researcher should have a solid foundation of knowledge in the area and a good feel for the direction any new research should take. Should any additional questions arise during the course of the research, the researcher will know which experts to consult in order to quickly clear up those questions.

  • << Previous: Home
  • Next: Steps in Conducting a Literature Review >>
  • Last Updated: Aug 29, 2022 8:54 AM
  • URL: https://libguides.unf.edu/litreview
  • UConn Library
  • Literature Review: The What, Why and How-to Guide
  • Introduction

Literature Review: The What, Why and How-to Guide — Introduction

  • Getting Started
  • How to Pick a Topic
  • Strategies to Find Sources
  • Evaluating Sources & Lit. Reviews
  • Tips for Writing Literature Reviews
  • Writing Literature Review: Useful Sites
  • Citation Resources
  • Other Academic Writings

What are Literature Reviews?

So, what is a literature review? "A literature review is an account of what has been published on a topic by accredited scholars and researchers. In writing the literature review, your purpose is to convey to your reader what knowledge and ideas have been established on a topic, and what their strengths and weaknesses are. As a piece of writing, the literature review must be defined by a guiding concept (e.g., your research objective, the problem or issue you are discussing, or your argumentative thesis). It is not just a descriptive list of the material available, or a set of summaries." Taylor, D.  The literature review: A few tips on conducting it . University of Toronto Health Sciences Writing Centre.

Goals of Literature Reviews

What are the goals of creating a Literature Review?  A literature could be written to accomplish different aims:

  • To develop a theory or evaluate an existing theory
  • To summarize the historical or existing state of a research topic
  • Identify a problem in a field of research 

Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews .  Review of General Psychology , 1 (3), 311-320.

What kinds of sources require a Literature Review?

  • A research paper assigned in a course
  • A thesis or dissertation
  • A grant proposal
  • An article intended for publication in a journal

All these instances require you to collect what has been written about your research topic so that you can demonstrate how your own research sheds new light on the topic.

Types of Literature Reviews

What kinds of literature reviews are written?

Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified. The review ends with a conclusion section which summarizes the findings regarding the state of the research of the specific study, the gaps identify and if applicable, explains how the author's research will address gaps identify in the review and expand the knowledge on the topic reviewed.

  • Example : Predictors and Outcomes of U.S. Quality Maternity Leave: A Review and Conceptual Framework:  10.1177/08948453211037398  

Systematic review : "The authors of a systematic review use a specific procedure to search the research literature, select the studies to include in their review, and critically evaluate the studies they find." (p. 139). Nelson, L. K. (2013). Research in Communication Sciences and Disorders . Plural Publishing.

  • Example : The effect of leave policies on increasing fertility: a systematic review:  10.1057/s41599-022-01270-w

Meta-analysis : "Meta-analysis is a method of reviewing research findings in a quantitative fashion by transforming the data from individual studies into what is called an effect size and then pooling and analyzing this information. The basic goal in meta-analysis is to explain why different outcomes have occurred in different studies." (p. 197). Roberts, M. C., & Ilardi, S. S. (2003). Handbook of Research Methods in Clinical Psychology . Blackwell Publishing.

  • Example : Employment Instability and Fertility in Europe: A Meta-Analysis:  10.1215/00703370-9164737

Meta-synthesis : "Qualitative meta-synthesis is a type of qualitative study that uses as data the findings from other qualitative studies linked by the same or related topic." (p.312). Zimmer, L. (2006). Qualitative meta-synthesis: A question of dialoguing with texts .  Journal of Advanced Nursing , 53 (3), 311-318.

  • Example : Women’s perspectives on career successes and barriers: A qualitative meta-synthesis:  10.1177/05390184221113735

Literature Reviews in the Health Sciences

  • UConn Health subject guide on systematic reviews Explanation of the different review types used in health sciences literature as well as tools to help you find the right review type
  • << Previous: Getting Started
  • Next: How to Pick a Topic >>
  • Last Updated: Sep 21, 2022 2:16 PM
  • URL: https://guides.lib.uconn.edu/literaturereview

Creative Commons

News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Conducting a literature review: why do a literature review, why do a literature review.

  • How To Find "The Literature"
  • Found it -- Now What?

Besides the obvious reason for students -- because it is assigned! -- a literature review helps you explore the research that has come before you, to see how your research question has (or has not) already been addressed.

You identify:

  • core research in the field
  • experts in the subject area
  • methodology you may want to use (or avoid)
  • gaps in knowledge -- or where your research would fit in

It Also Helps You:

  • Publish and share your findings
  • Justify requests for grants and other funding
  • Identify best practices to inform practice
  • Set wider context for a program evaluation
  • Compile information to support community organizing

Great brief overview, from NCSU

Want To Know More?

Cover Art

  • Next: How To Find "The Literature" >>
  • Last Updated: Apr 25, 2024 1:10 PM
  • URL: https://guides.lib.berkeley.edu/litreview

Harvey Cushing/John Hay Whitney Medical Library

  • Collections
  • Research Help

YSN Doctoral Programs: Steps in Conducting a Literature Review

  • Biomedical Databases
  • Global (Public Health) Databases
  • Soc. Sci., History, and Law Databases
  • Grey Literature
  • Trials Registers
  • Data and Statistics
  • Public Policy
  • Google Tips
  • Recommended Books
  • Steps in Conducting a Literature Review

What is a literature review?

A literature review is an integrated analysis -- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question.  That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

A literature review may be a stand alone work or the introduction to a larger research paper, depending on the assignment.  Rely heavily on the guidelines your instructor has given you.

Why is it important?

A literature review is important because it:

  • Explains the background of research on a topic.
  • Demonstrates why a topic is significant to a subject area.
  • Discovers relationships between research studies/ideas.
  • Identifies major themes, concepts, and researchers on a topic.
  • Identifies critical gaps and points of disagreement.
  • Discusses further research questions that logically come out of the previous studies.

APA7 Style resources

Cover Art

APA Style Blog - for those harder to find answers

1. Choose a topic. Define your research question.

Your literature review should be guided by your central research question.  The literature represents background and research developments related to a specific research question, interpreted and analyzed by you in a synthesized way.

  • Make sure your research question is not too broad or too narrow.  Is it manageable?
  • Begin writing down terms that are related to your question. These will be useful for searches later.
  • If you have the opportunity, discuss your topic with your professor and your class mates.

2. Decide on the scope of your review

How many studies do you need to look at? How comprehensive should it be? How many years should it cover? 

  • This may depend on your assignment.  How many sources does the assignment require?

3. Select the databases you will use to conduct your searches.

Make a list of the databases you will search. 

Where to find databases:

  • use the tabs on this guide
  • Find other databases in the Nursing Information Resources web page
  • More on the Medical Library web page
  • ... and more on the Yale University Library web page

4. Conduct your searches to find the evidence. Keep track of your searches.

  • Use the key words in your question, as well as synonyms for those words, as terms in your search. Use the database tutorials for help.
  • Save the searches in the databases. This saves time when you want to redo, or modify, the searches. It is also helpful to use as a guide is the searches are not finding any useful results.
  • Review the abstracts of research studies carefully. This will save you time.
  • Use the bibliographies and references of research studies you find to locate others.
  • Check with your professor, or a subject expert in the field, if you are missing any key works in the field.
  • Ask your librarian for help at any time.
  • Use a citation manager, such as EndNote as the repository for your citations. See the EndNote tutorials for help.

Review the literature

Some questions to help you analyze the research:

  • What was the research question of the study you are reviewing? What were the authors trying to discover?
  • Was the research funded by a source that could influence the findings?
  • What were the research methodologies? Analyze its literature review, the samples and variables used, the results, and the conclusions.
  • Does the research seem to be complete? Could it have been conducted more soundly? What further questions does it raise?
  • If there are conflicting studies, why do you think that is?
  • How are the authors viewed in the field? Has this study been cited? If so, how has it been analyzed?

Tips: 

  • Review the abstracts carefully.  
  • Keep careful notes so that you may track your thought processes during the research process.
  • Create a matrix of the studies for easy analysis, and synthesis, across all of the studies.
  • << Previous: Recommended Books
  • Last Updated: Jan 4, 2024 10:52 AM
  • URL: https://guides.library.yale.edu/YSNDoctoral

University of Derby

Dissertations - Skills Guide

  • Where to start
  • Research Proposal
  • Ethics Form
  • Primary Research

Literature Review

  • Methodology
  • Downloadable Resources
  • Further Reading

What is it?

Literature reviews involve collecting information from literature that is already available, similar to a long essay. It is a written argument that builds a case from previous research (Machi and McEvoy, 2012). Every dissertation should include a literature review, but a dissertation as a whole can be a literature review. In this section we discuss literature reviews for the whole dissertation.

What are the benefits of a literature review?

There are advantages and disadvantages to any approach. The advantages of conducting a literature review include accessibility, deeper understanding of your chosen topic, identifying experts and current research within that area, and answering key questions about current research. The disadvantages might include not providing new information on the subject and, depending on the subject area, you may have to include information that is out of date.

How do I write it?

A literature review is often split into chapters, you can choose if these chapters have titles that represent the information within them, or call them chapter 1, chapter 2, ect. A regular format for a literature review is:

Introduction (including methodology)

This particular example is split into 6 sections, however it may be more or less depending on your topic.

Literature Reviews Further Reading

Cover Art

  • << Previous: Primary Research
  • Next: Methodology >>
  • Last Updated: Oct 18, 2023 9:32 AM
  • URL: https://libguides.derby.ac.uk/c.php?g=690330

Home

Get Started

Take the first step and invest in your future.

colonnade and university hall

Online Programs

Offering flexibility & convenience in 51 online degrees & programs.

student at laptop

Prairie Stars

Featuring 15 intercollegiate NCAA Div II athletic teams.

campus in spring

Find your Fit

UIS has over 85 student and 10 greek life organizations, and many volunteer opportunities.

campus in spring

Arts & Culture

Celebrating the arts to create rich cultural experiences on campus.

campus in spring

Give Like a Star

Your generosity helps fuel fundraising for scholarships, programs and new initiatives.

alumni at gala

Bragging Rights

UIS was listed No. 1 in Illinois and No. 3 in the Midwest in 2023 rankings.

lincoln statue fall

  • Quick links Applicants & Students Important Apps & Links Alumni Faculty and Staff Community Admissions How to Apply Cost & Aid Tuition Calculator Registrar Orientation Visit Campus Academics Register for Class Programs of Study Online Degrees & Programs Graduate Education International Student Services Study Away Student Support Bookstore UIS Life Dining Diversity & Inclusion Get Involved Health & Wellness COVID-19 United in Safety Residence Life Student Life Programs UIS Connection Important Apps UIS Mobile App Advise U Canvas myUIS i-card Balance Pay My Bill - UIS Bursar Self-Service Email Resources Bookstore Box Information Technology Services Library Orbit Policies Webtools Get Connected Area Information Calendar Campus Recreation Departments & Programs (A-Z) Parking UIS Newsroom The Observer Connect & Get Involved Update your Info Alumni Events Alumni Networks & Groups Volunteer Opportunities Alumni Board News & Publications Featured Alumni Alumni News UIS Alumni Magazine Resources Order your Transcripts Give Back Alumni Programs Career Development Services & Support Accessibility Services Campus Services Campus Police Facilities & Services Registrar Faculty & Staff Resources Website Project Request Web Services Training & Tools Academic Impressions Career Connect CSA Reporting Cybersecurity Training Faculty Research FERPA Training Website Login Campus Resources Newsroom Campus Calendar Campus Maps i-Card Human Resources Public Relations Webtools Arts & Events UIS Performing Arts Center Visual Arts Gallery Event Calendar Sangamon Experience Center for Lincoln Studies ECCE Speaker Series Community Engagement Center for State Policy and Leadership Illinois Innocence Project Innovate Springfield Central IL Nonprofit Resource Center NPR Illinois Community Resources Child Protection Training Academy Office of Electronic Media University Archives/IRAD Institute for Illinois Public Finance

Request Info

Home

Literature Review

drone shot of quad

  • Request Info Request info for....     Undergraduate/Graduate     Online     Study Away     Continuing & Professional Education     International Student Services     General Inquiries

The purpose of a literature review is to collect relevant, timely research on your chosen topic, and synthesize it into a cohesive summary of existing knowledge in the field. This then prepares you for making your own argument on that topic, or for conducting your own original research.

Depending on your field of study, literature reviews can take different forms. Some disciplines require that you synthesize your sources topically, organizing your paragraphs according to how your different sources discuss similar topics. Other disciplines require that you discuss each source in individual paragraphs, covering various aspects in that single article, chapter, or book.

Within your review of a given source, you can cover many different aspects, including (if a research study) the purpose, scope, methods, results, any discussion points, limitations, and implications for future research. Make sure you know which model your professor expects you to follow when writing your own literature reviews.

Tip : Literature reviews may or may not be a graded component of your class or major assignment, but even if it is not, it is a good idea to draft one so that you know the current conversations taking place on your chosen topic. It can better prepare you to write your own, unique argument.

Benefits of Literature Reviews

  • Literature reviews allow you to gain familiarity with the current knowledge in your chosen field, as well as the boundaries and limitations of that field.
  • Literature reviews also help you to gain an understanding of the theory(ies) driving the field, allowing you to place your research question into context.
  • Literature reviews provide an opportunity for you to see and even evaluate successful and unsuccessful assessment and research methods in your field.
  • Literature reviews prevent you from duplicating the same information as others writing in your field, allowing you to find your own, unique approach to your topic.
  • Literature reviews give you familiarity with the knowledge in your field, giving you the chance to analyze the significance of your additional research.

Choosing Your Sources

When selecting your sources to compile your literature review, make sure you follow these guidelines to ensure you are working with the strongest, most appropriate sources possible.

Topically Relevant

Find sources within the scope of your topic

Appropriately Aged

Find sources that are not too old for your assignment

Find sources whose authors have authority on your topic

Appropriately “Published”

Find sources that meet your instructor’s guidelines (academic, professional, print, etc.)

Tip:  Treat your professors and librarians as experts you can turn to for advice on how to locate sources. They are a valuable asset to you, so take advantage of them!

Organizing Your Literature Review

Synthesizing topically.

Some assignments require discussing your sources together, in paragraphs organized according to shared topics between them.

For example, in a literature review covering current conversations on Alison Bechdel’s  Fun Home , authors may discuss various topics including:

  • her graphic style
  • her allusions to various literary texts
  • her story’s implications regarding LGBT experiences in 20 th  century America.

In this case, you would cluster your sources on these three topics. One paragraph would cover how the sources you collected dealt with Bechdel’s graphic style. Another, her allusions. A third, her implications.

Each of these paragraphs would discuss how the sources you found treated these topics in connection to one another. Basically, you compare and contrast how your sources discuss similar issues and points.

To determine these shared topics, examine aspects including:

  • Definition of terms
  • Common ground
  • Issues that divide
  • Rhetorical context

Summarizing Individually

Depending on the assignment, your professor may prefer that you discuss each source in your literature review individually (in their own, separate paragraphs or sections). Your professor may give you specific guidelines as far as what to cover in these paragraphs/sections.

If, for instance, your sources are all primary research studies, here are some aspects to consider covering:

  • Participants
  • Limitations
  • Implications
  • Significance

Each section of your literature review, in this case, will identify all of these elements for each individual article.

You may or may not need to separate your information into multiple paragraphs for each source. If you do, using proper headings in the appropriate citation style (APA, MLA, etc.) will help keep you organized.

If you are writing a literature review as part of a larger assignment, you generally do not need an introduction and/or conclusion, because it is embedded within the context of your larger paper.

If, however, your literature review is a standalone assignment, it is a good idea to include some sort of introduction and conclusion to provide your reader with context regarding your topic, purpose, and any relevant implications or further questions. Make sure you know what your professor is expecting for your literature review’s content.

Typically, a literature review concludes with a full bibliography of your included sources. Make sure you use the style guide required by your professor for this assignment.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 19, Issue 1
  • Reviewing the literature
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Joanna Smith 1 ,
  • Helen Noble 2
  • 1 School of Healthcare, University of Leeds , Leeds , UK
  • 2 School of Nursing and Midwifery, Queens's University Belfast , Belfast , UK
  • Correspondence to Dr Joanna Smith , School of Healthcare, University of Leeds, Leeds LS2 9JT, UK; j.e.smith1{at}leeds.ac.uk

https://doi.org/10.1136/eb-2015-102252

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Implementing evidence into practice requires nurses to identify, critically appraise and synthesise research. This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review.

Are there different approaches to undertaking a literature review?

What stages are required to undertake a literature review.

The rationale for the review should be established; consider why the review is important and relevant to patient care/safety or service delivery. For example, Noble et al 's 4 review sought to understand and make recommendations for practice and research in relation to dialysis refusal and withdrawal in patients with end-stage renal disease, an area of care previously poorly described. If appropriate, highlight relevant policies and theoretical perspectives that might guide the review. Once the key issues related to the topic, including the challenges encountered in clinical practice, have been identified formulate a clear question, and/or develop an aim and specific objectives. The type of review undertaken is influenced by the purpose of the review and resources available. However, the stages or methods used to undertake a review are similar across approaches and include:

Formulating clear inclusion and exclusion criteria, for example, patient groups, ages, conditions/treatments, sources of evidence/research designs;

Justifying data bases and years searched, and whether strategies including hand searching of journals, conference proceedings and research not indexed in data bases (grey literature) will be undertaken;

Developing search terms, the PICU (P: patient, problem or population; I: intervention; C: comparison; O: outcome) framework is a useful guide when developing search terms;

Developing search skills (eg, understanding Boolean Operators, in particular the use of AND/OR) and knowledge of how data bases index topics (eg, MeSH headings). Working with a librarian experienced in undertaking health searches is invaluable when developing a search.

Once studies are selected, the quality of the research/evidence requires evaluation. Using a quality appraisal tool, such as the Critical Appraisal Skills Programme (CASP) tools, 5 results in a structured approach to assessing the rigour of studies being reviewed. 3 Approaches to data synthesis for quantitative studies may include a meta-analysis (statistical analysis of data from multiple studies of similar designs that have addressed the same question), or findings can be reported descriptively. 6 Methods applicable for synthesising qualitative studies include meta-ethnography (themes and concepts from different studies are explored and brought together using approaches similar to qualitative data analysis methods), narrative summary, thematic analysis and content analysis. 7 Table 1 outlines the stages undertaken for a published review that summarised research about parents’ experiences of living with a child with a long-term condition. 8

  • View inline

An example of rapid evidence assessment review

In summary, the type of literature review depends on the review purpose. For the novice reviewer undertaking a review can be a daunting and complex process; by following the stages outlined and being systematic a robust review is achievable. The importance of literature reviews should not be underestimated—they help summarise and make sense of an increasingly vast body of research promoting best evidence-based practice.

  • ↵ Centre for Reviews and Dissemination . Guidance for undertaking reviews in health care . 3rd edn . York : CRD, York University , 2009 .
  • ↵ Canadian Best Practices Portal. http://cbpp-pcpe.phac-aspc.gc.ca/interventions/selected-systematic-review-sites / ( accessed 7.8.2015 ).
  • Bridges J , et al
  • ↵ Critical Appraisal Skills Programme (CASP). http://www.casp-uk.net / ( accessed 7.8.2015 ).
  • Dixon-Woods M ,
  • Shaw R , et al
  • Agarwal S ,
  • Jones D , et al
  • Cheater F ,

Twitter Follow Joanna Smith at @josmith175

Competing interests None declared.

Read the full text or download the PDF:

A systematic literature review of empirical research on ChatGPT in education

  • Open access
  • Published: 26 May 2024
  • Volume 3 , article number  60 , ( 2024 )

Cite this article

You have full access to this open access article

advantages of using literature review

  • Yazid Albadarin   ORCID: orcid.org/0009-0005-8068-8902 1 ,
  • Mohammed Saqr 1 ,
  • Nicolas Pope 1 &
  • Markku Tukiainen 1  

475 Accesses

Explore all metrics

Over the last four decades, studies have investigated the incorporation of Artificial Intelligence (AI) into education. A recent prominent AI-powered technology that has impacted the education sector is ChatGPT. This article provides a systematic review of 14 empirical studies incorporating ChatGPT into various educational settings, published in 2022 and before the 10th of April 2023—the date of conducting the search process. It carefully followed the essential steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines, as well as Okoli’s (Okoli in Commun Assoc Inf Syst, 2015) steps for conducting a rigorous and transparent systematic review. In this review, we aimed to explore how students and teachers have utilized ChatGPT in various educational settings, as well as the primary findings of those studies. By employing Creswell’s (Creswell in Educational research: planning, conducting, and evaluating quantitative and qualitative research [Ebook], Pearson Education, London, 2015) coding techniques for data extraction and interpretation, we sought to gain insight into their initial attempts at ChatGPT incorporation into education. This approach also enabled us to extract insights and considerations that can facilitate its effective and responsible use in future educational contexts. The results of this review show that learners have utilized ChatGPT as a virtual intelligent assistant, where it offered instant feedback, on-demand answers, and explanations of complex topics. Additionally, learners have used it to enhance their writing and language skills by generating ideas, composing essays, summarizing, translating, paraphrasing texts, or checking grammar. Moreover, learners turned to it as an aiding tool to facilitate their directed and personalized learning by assisting in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks. However, the results of specific studies (n = 3, 21.4%) show that overuse of ChatGPT may negatively impact innovative capacities and collaborative learning competencies among learners. Educators, on the other hand, have utilized ChatGPT to create lesson plans, generate quizzes, and provide additional resources, which helped them enhance their productivity and efficiency and promote different teaching methodologies. Despite these benefits, the majority of the reviewed studies recommend the importance of conducting structured training, support, and clear guidelines for both learners and educators to mitigate the drawbacks. This includes developing critical evaluation skills to assess the accuracy and relevance of information provided by ChatGPT, as well as strategies for integrating human interaction and collaboration into learning activities that involve AI tools. Furthermore, they also recommend ongoing research and proactive dialogue with policymakers, stakeholders, and educational practitioners to refine and enhance the use of AI in learning environments. This review could serve as an insightful resource for practitioners who seek to integrate ChatGPT into education and stimulate further research in the field.

Similar content being viewed by others

advantages of using literature review

Empowering learners with ChatGPT: insights from a systematic literature exploration

advantages of using literature review

Incorporating AI in foreign language education: An investigation into ChatGPT’s effect on foreign language learners

advantages of using literature review

Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT

Avoid common mistakes on your manuscript.

1 Introduction

Educational technology, a rapidly evolving field, plays a crucial role in reshaping the landscape of teaching and learning [ 82 ]. One of the most transformative technological innovations of our era that has influenced the field of education is Artificial Intelligence (AI) [ 50 ]. Over the last four decades, AI in education (AIEd) has gained remarkable attention for its potential to make significant advancements in learning, instructional methods, and administrative tasks within educational settings [ 11 ]. In particular, a large language model (LLM), a type of AI algorithm that applies artificial neural networks (ANNs) and uses massively large data sets to understand, summarize, generate, and predict new content that is almost difficult to differentiate from human creations [ 79 ], has opened up novel possibilities for enhancing various aspects of education, from content creation to personalized instruction [ 35 ]. Chatbots that leverage the capabilities of LLMs to understand and generate human-like responses have also presented the capacity to enhance student learning and educational outcomes by engaging students, offering timely support, and fostering interactive learning experiences [ 46 ].

The ongoing and remarkable technological advancements in chatbots have made their use more convenient, increasingly natural and effortless, and have expanded their potential for deployment across various domains [ 70 ]. One prominent example of chatbot applications is the Chat Generative Pre-Trained Transformer, known as ChatGPT, which was introduced by OpenAI, a leading AI research lab, on November 30th, 2022. ChatGPT employs a variety of deep learning techniques to generate human-like text, with a particular focus on recurrent neural networks (RNNs). Long short-term memory (LSTM) allows it to grasp the context of the text being processed and retain information from previous inputs. Also, the transformer architecture, a neural network architecture based on the self-attention mechanism, allows it to analyze specific parts of the input, thereby enabling it to produce more natural-sounding and coherent output. Additionally, the unsupervised generative pre-training and the fine-tuning methods allow ChatGPT to generate more relevant and accurate text for specific tasks [ 31 , 62 ]. Furthermore, reinforcement learning from human feedback (RLHF), a machine learning approach that combines reinforcement learning techniques with human-provided feedback, has helped improve ChatGPT’s model by accelerating the learning process and making it significantly more efficient.

This cutting-edge natural language processing (NLP) tool is widely recognized as one of today's most advanced LLMs-based chatbots [ 70 ], allowing users to ask questions and receive detailed, coherent, systematic, personalized, convincing, and informative human-like responses [ 55 ], even within complex and ambiguous contexts [ 63 , 77 ]. ChatGPT is considered the fastest-growing technology in history: in just three months following its public launch, it amassed an estimated 120 million monthly active users [ 16 ] with an estimated 13 million daily queries [ 49 ], surpassing all other applications [ 64 ]. This remarkable growth can be attributed to the unique features and user-friendly interface that ChatGPT offers. Its intuitive design allows users to interact seamlessly with the technology, making it accessible to a diverse range of individuals, regardless of their technical expertise [ 78 ]. Additionally, its exceptional performance results from a combination of advanced algorithms, continuous enhancements, and extensive training on a diverse dataset that includes various text sources such as books, articles, websites, and online forums [ 63 ], have contributed to a more engaging and satisfying user experience [ 62 ]. These factors collectively explain its remarkable global growth and set it apart from predecessors like Bard, Bing Chat, ERNIE, and others.

In this context, several studies have explored the technological advancements of chatbots. One noteworthy recent research effort, conducted by Schöbel et al. [ 70 ], stands out for its comprehensive analysis of more than 5,000 studies on communication agents. This study offered a comprehensive overview of the historical progression and future prospects of communication agents, including ChatGPT. Moreover, other studies have focused on making comparisons, particularly between ChatGPT and alternative chatbots like Bard, Bing Chat, ERNIE, LaMDA, BlenderBot, and various others. For example, O’Leary [ 53 ] compared two chatbots, LaMDA and BlenderBot, with ChatGPT and revealed that ChatGPT outperformed both. This superiority arises from ChatGPT’s capacity to handle a wider range of questions and generate slightly varied perspectives within specific contexts. Similarly, ChatGPT exhibited an impressive ability to formulate interpretable responses that were easily understood when compared with Google's feature snippet [ 34 ]. Additionally, ChatGPT was compared to other LLMs-based chatbots, including Bard and BERT, as well as ERNIE. The findings indicated that ChatGPT exhibited strong performance in the given tasks, often outperforming the other models [ 59 ].

Furthermore, in the education context, a comprehensive study systematically compared a range of the most promising chatbots, including Bard, Bing Chat, ChatGPT, and Ernie across a multidisciplinary test that required higher-order thinking. The study revealed that ChatGPT achieved the highest score, surpassing Bing Chat and Bard [ 64 ]. Similarly, a comparative analysis was conducted to compare ChatGPT with Bard in answering a set of 30 mathematical questions and logic problems, grouped into two question sets. Set (A) is unavailable online, while Set (B) is available online. The results revealed ChatGPT's superiority in Set (A) over Bard. Nevertheless, Bard's advantage emerged in Set (B) due to its capacity to access the internet directly and retrieve answers, a capability that ChatGPT does not possess [ 57 ]. However, through these varied assessments, ChatGPT consistently highlights its exceptional prowess compared to various alternatives in the ever-evolving chatbot technology.

The widespread adoption of chatbots, especially ChatGPT, by millions of students and educators, has sparked extensive discussions regarding its incorporation into the education sector [ 64 ]. Accordingly, many scholars have contributed to the discourse, expressing both optimism and pessimism regarding the incorporation of ChatGPT into education. For example, ChatGPT has been highlighted for its capabilities in enriching the learning and teaching experience through its ability to support different learning approaches, including adaptive learning, personalized learning, and self-directed learning [ 58 , 60 , 91 ]), deliver summative and formative feedback to students and provide real-time responses to questions, increase the accessibility of information [ 22 , 40 , 43 ], foster students’ performance, engagement and motivation [ 14 , 44 , 58 ], and enhance teaching practices [ 17 , 18 , 64 , 74 ].

On the other hand, concerns have been also raised regarding its potential negative effects on learning and teaching. These include the dissemination of false information and references [ 12 , 23 , 61 , 85 ], biased reinforcement [ 47 , 50 ], compromised academic integrity [ 18 , 40 , 66 , 74 ], and the potential decline in students' skills [ 43 , 61 , 64 , 74 ]. As a result, ChatGPT has been banned in multiple countries, including Russia, China, Venezuela, Belarus, and Iran, as well as in various educational institutions in India, Italy, Western Australia, France, and the United States [ 52 , 90 ].

Clearly, the advent of chatbots, especially ChatGPT, has provoked significant controversy due to their potential impact on learning and teaching. This indicates the necessity for further exploration to gain a deeper understanding of this technology and carefully evaluate its potential benefits, limitations, challenges, and threats to education [ 79 ]. Therefore, conducting a systematic literature review will provide valuable insights into the potential prospects and obstacles linked to its incorporation into education. This systematic literature review will primarily focus on ChatGPT, driven by the aforementioned key factors outlined above.

However, the existing literature lacks a systematic literature review of empirical studies. Thus, this systematic literature review aims to address this gap by synthesizing the existing empirical studies conducted on chatbots, particularly ChatGPT, in the field of education, highlighting how ChatGPT has been utilized in educational settings, and identifying any existing gaps. This review may be particularly useful for researchers in the field and educators who are contemplating the integration of ChatGPT or any chatbot into education. The following research questions will guide this study:

What are students' and teachers' initial attempts at utilizing ChatGPT in education?

What are the main findings derived from empirical studies that have incorporated ChatGPT into learning and teaching?

2 Methodology

To conduct this study, the authors followed the essential steps of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) and Okoli’s [ 54 ] steps for conducting a systematic review. These included identifying the study’s purpose, drafting a protocol, applying a practical screening process, searching the literature, extracting relevant data, evaluating the quality of the included studies, synthesizing the studies, and ultimately writing the review. The subsequent section provides an extensive explanation of how these steps were carried out in this study.

2.1 Identify the purpose

Given the widespread adoption of ChatGPT by students and teachers for various educational purposes, often without a thorough understanding of responsible and effective use or a clear recognition of its potential impact on learning and teaching, the authors recognized the need for further exploration of ChatGPT's impact on education in this early stage. Therefore, they have chosen to conduct a systematic literature review of existing empirical studies that incorporate ChatGPT into educational settings. Despite the limited number of empirical studies due to the novelty of the topic, their goal is to gain a deeper understanding of this technology and proactively evaluate its potential benefits, limitations, challenges, and threats to education. This effort could help to understand initial reactions and attempts at incorporating ChatGPT into education and bring out insights and considerations that can inform the future development of education.

2.2 Draft the protocol

The next step is formulating the protocol. This protocol serves to outline the study process in a rigorous and transparent manner, mitigating researcher bias in study selection and data extraction [ 88 ]. The protocol will include the following steps: generating the research question, predefining a literature search strategy, identifying search locations, establishing selection criteria, assessing the studies, developing a data extraction strategy, and creating a timeline.

2.3 Apply practical screen

The screening step aims to accurately filter the articles resulting from the searching step and select the empirical studies that have incorporated ChatGPT into educational contexts, which will guide us in answering the research questions and achieving the objectives of this study. To ensure the rigorous execution of this step, our inclusion and exclusion criteria were determined based on the authors' experience and informed by previous successful systematic reviews [ 21 ]. Table 1 summarizes the inclusion and exclusion criteria for study selection.

2.4 Literature search

We conducted a thorough literature search to identify articles that explored, examined, and addressed the use of ChatGPT in Educational contexts. We utilized two research databases: Dimensions.ai, which provides access to a large number of research publications, and lens.org, which offers access to over 300 million articles, patents, and other research outputs from diverse sources. Additionally, we included three databases, Scopus, Web of Knowledge, and ERIC, which contain relevant research on the topic that addresses our research questions. To browse and identify relevant articles, we used the following search formula: ("ChatGPT" AND "Education"), which included the Boolean operator "AND" to get more specific results. The subject area in the Scopus and ERIC databases were narrowed to "ChatGPT" and "Education" keywords, and in the WoS database was limited to the "Education" category. The search was conducted between the 3rd and 10th of April 2023, which resulted in 276 articles from all selected databases (111 articles from Dimensions.ai, 65 from Scopus, 28 from Web of Science, 14 from ERIC, and 58 from Lens.org). These articles were imported into the Rayyan web-based system for analysis. The duplicates were identified automatically by the system. Subsequently, the first author manually reviewed the duplicated articles ensured that they had the same content, and then removed them, leaving us with 135 unique articles. Afterward, the titles, abstracts, and keywords of the first 40 manuscripts were scanned and reviewed by the first author and were discussed with the second and third authors to resolve any disagreements. Subsequently, the first author proceeded with the filtering process for all articles and carefully applied the inclusion and exclusion criteria as presented in Table  1 . Articles that met any one of the exclusion criteria were eliminated, resulting in 26 articles. Afterward, the authors met to carefully scan and discuss them. The authors agreed to eliminate any empirical studies solely focused on checking ChatGPT capabilities, as these studies do not guide us in addressing the research questions and achieving the study's objectives. This resulted in 14 articles eligible for analysis.

2.5 Quality appraisal

The examination and evaluation of the quality of the extracted articles is a vital step [ 9 ]. Therefore, the extracted articles were carefully evaluated for quality using Fink’s [ 24 ] standards, which emphasize the necessity for detailed descriptions of methodology, results, conclusions, strengths, and limitations. The process began with a thorough assessment of each study's design, data collection, and analysis methods to ensure their appropriateness and comprehensive execution. The clarity, consistency, and logical progression from data to results and conclusions were also critically examined. Potential biases and recognized limitations within the studies were also scrutinized. Ultimately, two articles were excluded for failing to meet Fink’s criteria, particularly in providing sufficient detail on methodology, results, conclusions, strengths, or limitations. The review process is illustrated in Fig.  1 .

figure 1

The study selection process

2.6 Data extraction

The next step is data extraction, the process of capturing the key information and categories from the included studies. To improve efficiency, reduce variation among authors, and minimize errors in data analysis, the coding categories were constructed using Creswell's [ 15 ] coding techniques for data extraction and interpretation. The coding process involves three sequential steps. The initial stage encompasses open coding , where the researcher examines the data, generates codes to describe and categorize it, and gains a deeper understanding without preconceived ideas. Following open coding is axial coding , where the interrelationships between codes from open coding are analyzed to establish more comprehensive categories or themes. The process concludes with selective coding , refining and integrating categories or themes to identify core concepts emerging from the data. The first coder performed the coding process, then engaged in discussions with the second and third authors to finalize the coding categories for the first five articles. The first coder then proceeded to code all studies and engaged again in discussions with the other authors to ensure the finalization of the coding process. After a comprehensive analysis and capturing of the key information from the included studies, the data extraction and interpretation process yielded several themes. These themes have been categorized and are presented in Table  2 . It is important to note that open coding results were removed from Table  2 for aesthetic reasons, as it included many generic aspects, such as words, short phrases, or sentences mentioned in the studies.

2.7 Synthesize studies

In this stage, we will gather, discuss, and analyze the key findings that emerged from the selected studies. The synthesis stage is considered a transition from an author-centric to a concept-centric focus, enabling us to map all the provided information to achieve the most effective evaluation of the data [ 87 ]. Initially, the authors extracted data that included general information about the selected studies, including the author(s)' names, study titles, years of publication, educational levels, research methodologies, sample sizes, participants, main aims or objectives, raw data sources, and analysis methods. Following that, all key information and significant results from the selected studies were compiled using Creswell’s [ 15 ] coding techniques for data extraction and interpretation to identify core concepts and themes emerging from the data, focusing on those that directly contributed to our research questions and objectives, such as the initial utilization of ChatGPT in learning and teaching, learners' and educators' familiarity with ChatGPT, and the main findings of each study. Finally, the data related to each selected study were extracted into an Excel spreadsheet for data processing. The Excel spreadsheet was reviewed by the authors, including a series of discussions to ensure the finalization of this process and prepare it for further analysis. Afterward, the final result being analyzed and presented in various types of charts and graphs. Table 4 presents the extracted data from the selected studies, with each study labeled with a capital 'S' followed by a number.

This section consists of two main parts. The first part provides a descriptive analysis of the data compiled from the reviewed studies. The second part presents the answers to the research questions and the main findings of these studies.

3.1 Part 1: descriptive analysis

This section will provide a descriptive analysis of the reviewed studies, including educational levels and fields, participants distribution, country contribution, research methodologies, study sample size, study population, publication year, list of journals, familiarity with ChatGPT, source of data, and the main aims and objectives of the studies. Table 4 presents a comprehensive overview of the extracted data from the selected studies.

3.1.1 The number of the reviewed studies and publication years

The total number of the reviewed studies was 14. All studies were empirical studies and published in different journals focusing on Education and Technology. One study was published in 2022 [S1], while the remaining were published in 2023 [S2]-[S14]. Table 3 illustrates the year of publication, the names of the journals, and the number of reviewed studies published in each journal for the studies reviewed.

3.1.2 Educational levels and fields

The majority of the reviewed studies, 11 studies, were conducted in higher education institutions [S1]-[S10] and [S13]. Two studies did not specify the educational level of the population [S12] and [S14], while one study focused on elementary education [S11]. However, the reviewed studies covered various fields of education. Three studies focused on Arts and Humanities Education [S8], [S11], and [S14], specifically English Education. Two studies focused on Engineering Education, with one in Computer Engineering [S2] and the other in Construction Education [S3]. Two studies focused on Mathematics Education [S5] and [S12]. One study focused on Social Science Education [S13]. One study focused on Early Education [S4]. One study focused on Journalism Education [S9]. Finally, three studies did not specify the field of education [S1], [S6], and [S7]. Figure  2 represents the educational levels in the reviewed studies, while Fig.  3 represents the context of the reviewed studies.

figure 2

Educational levels in the reviewed studies

figure 3

Context of the reviewed studies

3.1.3 Participants distribution and countries contribution

The reviewed studies have been conducted across different geographic regions, providing a diverse representation of the studies. The majority of the studies, 10 in total, [S1]-[S3], [S5]-[S9], [S11], and [S14], primarily focused on participants from single countries such as Pakistan, the United Arab Emirates, China, Indonesia, Poland, Saudi Arabia, South Korea, Spain, Tajikistan, and the United States. In contrast, four studies, [S4], [S10], [S12], and [S13], involved participants from multiple countries, including China and the United States [S4], China, the United Kingdom, and the United States [S10], the United Arab Emirates, Oman, Saudi Arabia, and Jordan [S12], Turkey, Sweden, Canada, and Australia [ 13 ]. Figures  4 and 5 illustrate the distribution of participants, whether from single or multiple countries, and the contribution of each country in the reviewed studies, respectively.

figure 4

The reviewed studies conducted in single or multiple countries

figure 5

The Contribution of each country in the studies

3.1.4 Study population and sample size

Four study populations were included: university students, university teachers, university teachers and students, and elementary school teachers. Six studies involved university students [S2], [S3], [S5] and [S6]-[S8]. Three studies focused on university teachers [S1], [S4], and [S6], while one study specifically targeted elementary school teachers [S11]. Additionally, four studies included both university teachers and students [S10] and [ 12 , 13 , 14 ], and among them, study [S13] specifically included postgraduate students. In terms of the sample size of the reviewed studies, nine studies included a small sample size of less than 50 participants [S1], [S3], [S6], [S8], and [S10]-[S13]. Three studies had 50–100 participants [S2], [S9], and [S14]. Only one study had more than 100 participants [S7]. It is worth mentioning that study [S4] adopted a mixed methods approach, including 10 participants for qualitative analysis and 110 participants for quantitative analysis.

3.1.5 Participants’ familiarity with using ChatGPT

The reviewed studies recruited a diverse range of participants with varying levels of familiarity with ChatGPT. Five studies [S2], [S4], [S6], [S8], and [S12] involved participants already familiar with ChatGPT, while eight studies [S1], [S3], [S5], [S7], [S9], [S10], [S13] and [S14] included individuals with differing levels of familiarity. Notably, one study [S11] had participants who were entirely unfamiliar with ChatGPT. It is important to note that four studies [S3], [S5], [S9], and [S11] provided training or guidance to their participants before conducting their studies, while ten studies [S1], [S2], [S4], [S6]-[S8], [S10], and [S12]-[S14] did not provide training due to the participants' existing familiarity with ChatGPT.

3.1.6 Research methodology approaches and source(S) of data

The reviewed studies adopted various research methodology approaches. Seven studies adopted qualitative research methodology [S1], [S4], [S6], [S8], [S10], [S11], and [S12], while three studies adopted quantitative research methodology [S3], [S7], and [S14], and four studies employed mixed-methods, which involved a combination of both the strengths of qualitative and quantitative methods [S2], [S5], [S9], and [S13].

In terms of the source(s) of data, the reviewed studies obtained their data from various sources, such as interviews, questionnaires, and pre-and post-tests. Six studies relied on interviews as their primary source of data collection [S1], [S4], [S6], [S10], [S11], and [S12], four studies relied on questionnaires [S2], [S7], [S13], and [S14], two studies combined the use of pre-and post-tests and questionnaires for data collection [S3] and [S9], while two studies combined the use of questionnaires and interviews to obtain the data [S5] and [S8]. It is important to note that six of the reviewed studies were quasi-experimental [S3], [S5], [S8], [S9], [S12], and [S14], while the remaining ones were experimental studies [S1], [S2], [S4], [S6], [S7], [S10], [S11], and [S13]. Figures  6 and 7 illustrate the research methodologies and the source (s) of data used in the reviewed studies, respectively.

figure 6

Research methodologies in the reviewed studies

figure 7

Source of data in the reviewed studies

3.1.7 The aim and objectives of the studies

The reviewed studies encompassed a diverse set of aims, with several of them incorporating multiple primary objectives. Six studies [S3], [S6], [S7], [S8], [S11], and [S12] examined the integration of ChatGPT in educational contexts, and four studies [S4], [S5], [S13], and [S14] investigated the various implications of its use in education, while three studies [S2], [S9], and [S10] aimed to explore both its integration and implications in education. Additionally, seven studies explicitly explored attitudes and perceptions of students [S2] and [S3], educators [S1] and [S6], or both [S10], [S12], and [S13] regarding the utilization of ChatGPT in educational settings.

3.2 Part 2: research questions and main findings of the reviewed studies

This part will present the answers to the research questions and the main findings of the reviewed studies, classified into two main categories (learning and teaching) according to AI Education classification by [ 36 ]. Figure  8 summarizes the main findings of the reviewed studies in a visually informative diagram. Table 4 provides a detailed list of the key information extracted from the selected studies that led to generating these themes.

figure 8

The main findings in the reviewed studies

4 Students' initial attempts at utilizing ChatGPT in learning and main findings from students' perspective

4.1 virtual intelligent assistant.

Nine studies demonstrated that ChatGPT has been utilized by students as an intelligent assistant to enhance and support their learning. Students employed it for various purposes, such as answering on-demand questions [S2]-[S5], [S8], [S10], and [S12], providing valuable information and learning resources [S2]-[S5], [S6], and [S8], as well as receiving immediate feedback [S2], [S4], [S9], [S10], and [S12]. In this regard, students generally were confident in the accuracy of ChatGPT's responses, considering them relevant, reliable, and detailed [S3], [S4], [S5], and [S8]. However, some students indicated the need for improvement, as they found that answers are not always accurate [S2], and that misleading information may have been provided or that it may not always align with their expectations [S6] and [S10]. It was also observed by the students that the accuracy of ChatGPT is dependent on several factors, including the quality and specificity of the user's input, the complexity of the question or topic, and the scope and relevance of its training data [S12]. Many students felt that ChatGPT's answers were not always accurate and most of them believed that it requires good background knowledge to work with.

4.2 Writing and language proficiency assistant

Six of the reviewed studies highlighted that ChatGPT has been utilized by students as a valuable assistant tool to improve their academic writing skills and language proficiency. Among these studies, three mainly focused on English education, demonstrating that students showed sufficient mastery in using ChatGPT for generating ideas, summarizing, paraphrasing texts, and completing writing essays [S8], [S11], and [S14]. Furthermore, ChatGPT helped them in writing by making students active investigators rather than passive knowledge recipients and facilitated the development of their writing skills [S11] and [S14]. Similarly, ChatGPT allowed students to generate unique ideas and perspectives, leading to deeper analysis and reflection on their journalism writing [S9]. In terms of language proficiency, ChatGPT allowed participants to translate content into their home languages, making it more accessible and relevant to their context [S4]. It also enabled them to request changes in linguistic tones or flavors [S8]. Moreover, participants used it to check grammar or as a dictionary [S11].

4.3 Valuable resource for learning approaches

Five studies demonstrated that students used ChatGPT as a valuable complementary resource for self-directed learning. It provided learning resources and guidance on diverse educational topics and created a supportive home learning environment [S2] and [S4]. Moreover, it offered step-by-step guidance to grasp concepts at their own pace and enhance their understanding [S5], streamlined task and project completion carried out independently [S7], provided comprehensive and easy-to-understand explanations on various subjects [S10], and assisted in studying geometry operations, thereby empowering them to explore geometry operations at their own pace [S12]. Three studies showed that students used ChatGPT as a valuable learning resource for personalized learning. It delivered age-appropriate conversations and tailored teaching based on a child's interests [S4], acted as a personalized learning assistant, adapted to their needs and pace, which assisted them in understanding mathematical concepts [S12], and enabled personalized learning experiences in social sciences by adapting to students' needs and learning styles [S13]. On the other hand, it is important to note that, according to one study [S5], students suggested that using ChatGPT may negatively affect collaborative learning competencies between students.

4.4 Enhancing students' competencies

Six of the reviewed studies have shown that ChatGPT is a valuable tool for improving a wide range of skills among students. Two studies have provided evidence that ChatGPT led to improvements in students' critical thinking, reasoning skills, and hazard recognition competencies through engaging them in interactive conversations or activities and providing responses related to their disciplines in journalism [S5] and construction education [S9]. Furthermore, two studies focused on mathematical education have shown the positive impact of ChatGPT on students' problem-solving abilities in unraveling problem-solving questions [S12] and enhancing the students' understanding of the problem-solving process [S5]. Lastly, one study indicated that ChatGPT effectively contributed to the enhancement of conversational social skills [S4].

4.5 Supporting students' academic success

Seven of the reviewed studies highlighted that students found ChatGPT to be beneficial for learning as it enhanced learning efficiency and improved the learning experience. It has been observed to improve students' efficiency in computer engineering studies by providing well-structured responses and good explanations [S2]. Additionally, students found it extremely useful for hazard reporting [S3], and it also enhanced their efficiency in solving mathematics problems and capabilities [S5] and [S12]. Furthermore, by finding information, generating ideas, translating texts, and providing alternative questions, ChatGPT aided students in deepening their understanding of various subjects [S6]. It contributed to an increase in students' overall productivity [S7] and improved efficiency in composing written tasks [S8]. Regarding learning experiences, ChatGPT was instrumental in assisting students in identifying hazards that they might have otherwise overlooked [S3]. It also improved students' learning experiences in solving mathematics problems and developing abilities [S5] and [S12]. Moreover, it increased students' successful completion of important tasks in their studies [S7], particularly those involving average difficulty writing tasks [S8]. Additionally, ChatGPT increased the chances of educational success by providing students with baseline knowledge on various topics [S10].

5 Teachers' initial attempts at utilizing ChatGPT in teaching and main findings from teachers' perspective

5.1 valuable resource for teaching.

The reviewed studies showed that teachers have employed ChatGPT to recommend, modify, and generate diverse, creative, organized, and engaging educational contents, teaching materials, and testing resources more rapidly [S4], [S6], [S10] and [S11]. Additionally, teachers experienced increased productivity as ChatGPT facilitated quick and accurate responses to questions, fact-checking, and information searches [S1]. It also proved valuable in constructing new knowledge [S6] and providing timely answers to students' questions in classrooms [S11]. Moreover, ChatGPT enhanced teachers' efficiency by generating new ideas for activities and preplanning activities for their students [S4] and [S6], including interactive language game partners [S11].

5.2 Improving productivity and efficiency

The reviewed studies showed that participants' productivity and work efficiency have been significantly enhanced by using ChatGPT as it enabled them to allocate more time to other tasks and reduce their overall workloads [S6], [S10], [S11], [S13], and [S14]. However, three studies [S1], [S4], and [S11], indicated a negative perception and attitude among teachers toward using ChatGPT. This negativity stemmed from a lack of necessary skills to use it effectively [S1], a limited familiarity with it [S4], and occasional inaccuracies in the content provided by it [S10].

5.3 Catalyzing new teaching methodologies

Five of the reviewed studies highlighted that educators found the necessity of redefining their teaching profession with the assistance of ChatGPT [S11], developing new effective learning strategies [S4], and adapting teaching strategies and methodologies to ensure the development of essential skills for future engineers [S5]. They also emphasized the importance of adopting new educational philosophies and approaches that can evolve with the introduction of ChatGPT into the classroom [S12]. Furthermore, updating curricula to focus on improving human-specific features, such as emotional intelligence, creativity, and philosophical perspectives [S13], was found to be essential.

5.4 Effective utilization of CHATGPT in teaching

According to the reviewed studies, effective utilization of ChatGPT in education requires providing teachers with well-structured training, support, and adequate background on how to use ChatGPT responsibly [S1], [S3], [S11], and [S12]. Establishing clear rules and regulations regarding its usage is essential to ensure it positively impacts the teaching and learning processes, including students' skills [S1], [S4], [S5], [S8], [S9], and [S11]-[S14]. Moreover, conducting further research and engaging in discussions with policymakers and stakeholders is indeed crucial for the successful integration of ChatGPT in education and to maximize the benefits for both educators and students [S1], [S6]-[S10], and [S12]-[S14].

6 Discussion

The purpose of this review is to conduct a systematic review of empirical studies that have explored the utilization of ChatGPT, one of today’s most advanced LLM-based chatbots, in education. The findings of the reviewed studies showed several ways of ChatGPT utilization in different learning and teaching practices as well as it provided insights and considerations that can facilitate its effective and responsible use in future educational contexts. The results of the reviewed studies came from diverse fields of education, which helped us avoid a biased review that is limited to a specific field. Similarly, the reviewed studies have been conducted across different geographic regions. This kind of variety in geographic representation enriched the findings of this review.

In response to RQ1 , "What are students' and teachers' initial attempts at utilizing ChatGPT in education?", the findings from this review provide comprehensive insights. Chatbots, including ChatGPT, play a crucial role in supporting student learning, enhancing their learning experiences, and facilitating diverse learning approaches [ 42 , 43 ]. This review found that this tool, ChatGPT, has been instrumental in enhancing students' learning experiences by serving as a virtual intelligent assistant, providing immediate feedback, on-demand answers, and engaging in educational conversations. Additionally, students have benefited from ChatGPT’s ability to generate ideas, compose essays, and perform tasks like summarizing, translating, paraphrasing texts, or checking grammar, thereby enhancing their writing and language competencies. Furthermore, students have turned to ChatGPT for assistance in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks, which fosters a supportive home learning environment, allowing them to take responsibility for their own learning and cultivate the skills and approaches essential for supportive home learning environment [ 26 , 27 , 28 ]. This finding aligns with the study of Saqr et al. [ 68 , 69 ] who highlighted that, when students actively engage in their own learning process, it yields additional advantages, such as heightened motivation, enhanced achievement, and the cultivation of enthusiasm, turning them into advocates for their own learning.

Moreover, students have utilized ChatGPT for tailored teaching and step-by-step guidance on diverse educational topics, streamlining task and project completion, and generating and recommending educational content. This personalization enhances the learning environment, leading to increased academic success. This finding aligns with other recent studies [ 26 , 27 , 28 , 60 , 66 ] which revealed that ChatGPT has the potential to offer personalized learning experiences and support an effective learning process by providing students with customized feedback and explanations tailored to their needs and abilities. Ultimately, fostering students' performance, engagement, and motivation, leading to increase students' academic success [ 14 , 44 , 58 ]. This ultimate outcome is in line with the findings of Saqr et al. [ 68 , 69 ], which emphasized that learning strategies are important catalysts of students' learning, as students who utilize effective learning strategies are more likely to have better academic achievement.

Teachers, too, have capitalized on ChatGPT's capabilities to enhance productivity and efficiency, using it for creating lesson plans, generating quizzes, providing additional resources, generating and preplanning new ideas for activities, and aiding in answering students’ questions. This adoption of technology introduces new opportunities to support teaching and learning practices, enhancing teacher productivity. This finding aligns with those of Day [ 17 ], De Castro [ 18 ], and Su and Yang [ 74 ] as well as with those of Valtonen et al. [ 82 ], who revealed that emerging technological advancements have opened up novel opportunities and means to support teaching and learning practices, and enhance teachers’ productivity.

In response to RQ2 , "What are the main findings derived from empirical studies that have incorporated ChatGPT into learning and teaching?", the findings from this review provide profound insights and raise significant concerns. Starting with the insights, chatbots, including ChatGPT, have demonstrated the potential to reshape and revolutionize education, creating new, novel opportunities for enhancing the learning process and outcomes [ 83 ], facilitating different learning approaches, and offering a range of pedagogical benefits [ 19 , 43 , 72 ]. In this context, this review found that ChatGPT could open avenues for educators to adopt or develop new effective learning and teaching strategies that can evolve with the introduction of ChatGPT into the classroom. Nonetheless, there is an evident lack of research understanding regarding the potential impact of generative machine learning models within diverse educational settings [ 83 ]. This necessitates teachers to attain a high level of proficiency in incorporating chatbots, such as ChatGPT, into their classrooms to create inventive, well-structured, and captivating learning strategies. In the same vein, the review also found that teachers without the requisite skills to utilize ChatGPT realized that it did not contribute positively to their work and could potentially have adverse effects [ 37 ]. This concern could lead to inequity of access to the benefits of chatbots, including ChatGPT, as individuals who lack the necessary expertise may not be able to harness their full potential, resulting in disparities in educational outcomes and opportunities. Therefore, immediate action is needed to address these potential issues. A potential solution is offering training, support, and competency development for teachers to ensure that all of them can leverage chatbots, including ChatGPT, effectively and equitably in their educational practices [ 5 , 28 , 80 ], which could enhance accessibility and inclusivity, and potentially result in innovative outcomes [ 82 , 83 ].

Additionally, chatbots, including ChatGPT, have the potential to significantly impact students' thinking abilities, including retention, reasoning, analysis skills [ 19 , 45 ], and foster innovation and creativity capabilities [ 83 ]. This review found that ChatGPT could contribute to improving a wide range of skills among students. However, it found that frequent use of ChatGPT may result in a decrease in innovative capacities, collaborative skills and cognitive capacities, and students' motivation to attend classes, as well as could lead to reduced higher-order thinking skills among students [ 22 , 29 ]. Therefore, immediate action is needed to carefully examine the long-term impact of chatbots such as ChatGPT, on learning outcomes as well as to explore its incorporation into educational settings as a supportive tool without compromising students' cognitive development and critical thinking abilities. In the same vein, the review also found that it is challenging to draw a consistent conclusion regarding the potential of ChatGPT to aid self-directed learning approach. This finding aligns with the recent study of Baskara [ 8 ]. Therefore, further research is needed to explore the potential of ChatGPT for self-directed learning. One potential solution involves utilizing learning analytics as a novel approach to examine various aspects of students' learning and support them in their individual endeavors [ 32 ]. This approach can bridge this gap by facilitating an in-depth analysis of how learners engage with ChatGPT, identifying trends in self-directed learning behavior, and assessing its influence on their outcomes.

Turning to the significant concerns, on the other hand, a fundamental challenge with LLM-based chatbots, including ChatGPT, is the accuracy and quality of the provided information and responses, as they provide false information as truth—a phenomenon often referred to as "hallucination" [ 3 , 49 ]. In this context, this review found that the provided information was not entirely satisfactory. Consequently, the utilization of chatbots presents potential concerns, such as generating and providing inaccurate or misleading information, especially for students who utilize it to support their learning. This finding aligns with other findings [ 6 , 30 , 35 , 40 ] which revealed that incorporating chatbots such as ChatGPT, into education presents challenges related to its accuracy and reliability due to its training on a large corpus of data, which may contain inaccuracies and the way users formulate or ask ChatGPT. Therefore, immediate action is needed to address these potential issues. One possible solution is to equip students with the necessary skills and competencies, which include a background understanding of how to use it effectively and the ability to assess and evaluate the information it generates, as the accuracy and the quality of the provided information depend on the input, its complexity, the topic, and the relevance of its training data [ 28 , 49 , 86 ]. However, it's also essential to examine how learners can be educated about how these models operate, the data used in their training, and how to recognize their limitations, challenges, and issues [ 79 ].

Furthermore, chatbots present a substantial challenge concerning maintaining academic integrity [ 20 , 56 ] and copyright violations [ 83 ], which are significant concerns in education. The review found that the potential misuse of ChatGPT might foster cheating, facilitate plagiarism, and threaten academic integrity. This issue is also affirmed by the research conducted by Basic et al. [ 7 ], who presented evidence that students who utilized ChatGPT in their writing assignments had more plagiarism cases than those who did not. These findings align with the conclusions drawn by Cotton et al. [ 13 ], Hisan and Amri [ 33 ] and Sullivan et al. [ 75 ], who revealed that the integration of chatbots such as ChatGPT into education poses a significant challenge to the preservation of academic integrity. Moreover, chatbots, including ChatGPT, have increased the difficulty in identifying plagiarism [ 47 , 67 , 76 ]. The findings from previous studies [ 1 , 84 ] indicate that AI-generated text often went undetected by plagiarism software, such as Turnitin. However, Turnitin and other similar plagiarism detection tools, such as ZeroGPT, GPTZero, and Copyleaks, have since evolved, incorporating enhanced techniques to detect AI-generated text, despite the possibility of false positives, as noted in different studies that have found these tools still not yet fully ready to accurately and reliably identify AI-generated text [ 10 , 51 ], and new novel detection methods may need to be created and implemented for AI-generated text detection [ 4 ]. This potential issue could lead to another concern, which is the difficulty of accurately evaluating student performance when they utilize chatbots such as ChatGPT assistance in their assignments. Consequently, the most LLM-driven chatbots present a substantial challenge to traditional assessments [ 64 ]. The findings from previous studies indicate the importance of rethinking, improving, and redesigning innovative assessment methods in the era of chatbots [ 14 , 20 , 64 , 75 ]. These methods should prioritize the process of evaluating students' ability to apply knowledge to complex cases and demonstrate comprehension, rather than solely focusing on the final product for assessment. Therefore, immediate action is needed to address these potential issues. One possible solution would be the development of clear guidelines, regulatory policies, and pedagogical guidance. These measures would help regulate the proper and ethical utilization of chatbots, such as ChatGPT, and must be established before their introduction to students [ 35 , 38 , 39 , 41 , 89 ].

In summary, our review has delved into the utilization of ChatGPT, a prominent example of chatbots, in education, addressing the question of how ChatGPT has been utilized in education. However, there remain significant gaps, which necessitate further research to shed light on this area.

7 Conclusions

This systematic review has shed light on the varied initial attempts at incorporating ChatGPT into education by both learners and educators, while also offering insights and considerations that can facilitate its effective and responsible use in future educational contexts. From the analysis of 14 selected studies, the review revealed the dual-edged impact of ChatGPT in educational settings. On the positive side, ChatGPT significantly aided the learning process in various ways. Learners have used it as a virtual intelligent assistant, benefiting from its ability to provide immediate feedback, on-demand answers, and easy access to educational resources. Additionally, it was clear that learners have used it to enhance their writing and language skills, engaging in practices such as generating ideas, composing essays, and performing tasks like summarizing, translating, paraphrasing texts, or checking grammar. Importantly, other learners have utilized it in supporting and facilitating their directed and personalized learning on a broad range of educational topics, assisting in understanding concepts and homework, providing structured learning plans, and clarifying assignments and tasks. Educators, on the other hand, found ChatGPT beneficial for enhancing productivity and efficiency. They used it for creating lesson plans, generating quizzes, providing additional resources, and answers learners' questions, which saved time and allowed for more dynamic and engaging teaching strategies and methodologies.

However, the review also pointed out negative impacts. The results revealed that overuse of ChatGPT could decrease innovative capacities and collaborative learning among learners. Specifically, relying too much on ChatGPT for quick answers can inhibit learners' critical thinking and problem-solving skills. Learners might not engage deeply with the material or consider multiple solutions to a problem. This tendency was particularly evident in group projects, where learners preferred consulting ChatGPT individually for solutions over brainstorming and collaborating with peers, which negatively affected their teamwork abilities. On a broader level, integrating ChatGPT into education has also raised several concerns, including the potential for providing inaccurate or misleading information, issues of inequity in access, challenges related to academic integrity, and the possibility of misusing the technology.

Accordingly, this review emphasizes the urgency of developing clear rules, policies, and regulations to ensure ChatGPT's effective and responsible use in educational settings, alongside other chatbots, by both learners and educators. This requires providing well-structured training to educate them on responsible usage and understanding its limitations, along with offering sufficient background information. Moreover, it highlights the importance of rethinking, improving, and redesigning innovative teaching and assessment methods in the era of ChatGPT. Furthermore, conducting further research and engaging in discussions with policymakers and stakeholders are essential steps to maximize the benefits for both educators and learners and ensure academic integrity.

It is important to acknowledge that this review has certain limitations. Firstly, the limited inclusion of reviewed studies can be attributed to several reasons, including the novelty of the technology, as new technologies often face initial skepticism and cautious adoption; the lack of clear guidelines or best practices for leveraging this technology for educational purposes; and institutional or governmental policies affecting the utilization of this technology in educational contexts. These factors, in turn, have affected the number of studies available for review. Secondly, the utilization of the original version of ChatGPT, based on GPT-3 or GPT-3.5, implies that new studies utilizing the updated version, GPT-4 may lead to different findings. Therefore, conducting follow-up systematic reviews is essential once more empirical studies on ChatGPT are published. Additionally, long-term studies are necessary to thoroughly examine and assess the impact of ChatGPT on various educational practices.

Despite these limitations, this systematic review has highlighted the transformative potential of ChatGPT in education, revealing its diverse utilization by learners and educators alike and summarized the benefits of incorporating it into education, as well as the forefront critical concerns and challenges that must be addressed to facilitate its effective and responsible use in future educational contexts. This review could serve as an insightful resource for practitioners who seek to integrate ChatGPT into education and stimulate further research in the field.

Data availability

The data supporting our findings are available upon request.

Abbreviations

  • Artificial intelligence

AI in education

Large language model

Artificial neural networks

Chat Generative Pre-Trained Transformer

Recurrent neural networks

Long short-term memory

Reinforcement learning from human feedback

Natural language processing

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

AlAfnan MA, Dishari S, Jovic M, Lomidze K. ChatGPT as an educational tool: opportunities, challenges, and recommendations for communication, business writing, and composition courses. J Artif Intell Technol. 2023. https://doi.org/10.37965/jait.2023.0184 .

Article   Google Scholar  

Ali JKM, Shamsan MAA, Hezam TA, Mohammed AAQ. Impact of ChatGPT on learning motivation. J Engl Stud Arabia Felix. 2023;2(1):41–9. https://doi.org/10.56540/jesaf.v2i1.51 .

Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023. https://doi.org/10.7759/cureus.35179 .

Anderson N, Belavý DL, Perle SM, Hendricks S, Hespanhol L, Verhagen E, Memon AR. AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in sports & exercise medicine manuscript generation. BMJ Open Sport Exerc Med. 2023;9(1): e001568. https://doi.org/10.1136/bmjsem-2023-001568 .

Ausat AMA, Massang B, Efendi M, Nofirman N, Riady Y. Can chat GPT replace the role of the teacher in the classroom: a fundamental analysis. J Educ. 2023;5(4):16100–6.

Google Scholar  

Baidoo-Anu D, Ansah L. Education in the Era of generative artificial intelligence (AI): understanding the potential benefits of ChatGPT in promoting teaching and learning. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4337484 .

Basic Z, Banovac A, Kruzic I, Jerkovic I. Better by you, better than me, chatgpt3 as writing assistance in students essays. 2023. arXiv preprint arXiv:2302.04536 .‏

Baskara FR. The promises and pitfalls of using chat GPT for self-determined learning in higher education: an argumentative review. Prosiding Seminar Nasional Fakultas Tarbiyah dan Ilmu Keguruan IAIM Sinjai. 2023;2:95–101. https://doi.org/10.47435/sentikjar.v2i0.1825 .

Behera RK, Bala PK, Dhir A. The emerging role of cognitive computing in healthcare: a systematic literature review. Int J Med Inform. 2019;129:154–66. https://doi.org/10.1016/j.ijmedinf.2019.04.024 .

Chaka C. Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: the case of five AI content detection tools. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.2.12 .

Chiu TKF, Xia Q, Zhou X, Chai CS, Cheng M. Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education. Comput Educ Artif Intell. 2023;4:100118. https://doi.org/10.1016/j.caeai.2022.100118 .

Choi EPH, Lee JJ, Ho M, Kwok JYY, Lok KYW. Chatting or cheating? The impacts of ChatGPT and other artificial intelligence language models on nurse education. Nurse Educ Today. 2023;125:105796. https://doi.org/10.1016/j.nedt.2023.105796 .

Cotton D, Cotton PA, Shipway JR. Chatting and cheating: ensuring academic integrity in the era of ChatGPT. Innov Educ Teach Int. 2023. https://doi.org/10.1080/14703297.2023.2190148 .

Crawford J, Cowling M, Allen K. Leadership is needed for ethical ChatGPT: Character, assessment, and learning using artificial intelligence (AI). J Univ Teach Learn Pract. 2023. https://doi.org/10.53761/1.20.3.02 .

Creswell JW. Educational research: planning, conducting, and evaluating quantitative and qualitative research [Ebook]. 4th ed. London: Pearson Education; 2015.

Curry D. ChatGPT Revenue and Usage Statistics (2023)—Business of Apps. 2023. https://www.businessofapps.com/data/chatgpt-statistics/

Day T. A preliminary investigation of fake peer-reviewed citations and references generated by ChatGPT. Prof Geogr. 2023. https://doi.org/10.1080/00330124.2023.2190373 .

De Castro CA. A Discussion about the Impact of ChatGPT in education: benefits and concerns. J Bus Theor Pract. 2023;11(2):p28. https://doi.org/10.22158/jbtp.v11n2p28 .

Deng X, Yu Z. A meta-analysis and systematic review of the effect of Chatbot technology use in sustainable education. Sustainability. 2023;15(4):2940. https://doi.org/10.3390/su15042940 .

Eke DO. ChatGPT and the rise of generative AI: threat to academic integrity? J Responsib Technol. 2023;13:100060. https://doi.org/10.1016/j.jrt.2023.100060 .

Elmoazen R, Saqr M, Tedre M, Hirsto L. A systematic literature review of empirical research on epistemic network analysis in education. IEEE Access. 2022;10:17330–48. https://doi.org/10.1109/access.2022.3149812 .

Farrokhnia M, Banihashem SK, Noroozi O, Wals AEJ. A SWOT analysis of ChatGPT: implications for educational practice and research. Innov Educ Teach Int. 2023. https://doi.org/10.1080/14703297.2023.2195846 .

Fergus S, Botha M, Ostovar M. Evaluating academic answers generated using ChatGPT. J Chem Educ. 2023;100(4):1672–5. https://doi.org/10.1021/acs.jchemed.3c00087 .

Fink A. Conducting research literature reviews: from the Internet to Paper. Incorporated: SAGE Publications; 2010.

Firaina R, Sulisworo D. Exploring the usage of ChatGPT in higher education: frequency and impact on productivity. Buletin Edukasi Indonesia (BEI). 2023;2(01):39–46. https://doi.org/10.56741/bei.v2i01.310 .

Firat, M. (2023). How chat GPT can transform autodidactic experiences and open education.  Department of Distance Education, Open Education Faculty, Anadolu Unive .‏ https://orcid.org/0000-0001-8707-5918

Firat M. What ChatGPT means for universities: perceptions of scholars and students. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.22 .

Fuchs K. Exploring the opportunities and challenges of NLP models in higher education: is Chat GPT a blessing or a curse? Front Educ. 2023. https://doi.org/10.3389/feduc.2023.1166682 .

García-Peñalvo FJ. La percepción de la inteligencia artificial en contextos educativos tras el lanzamiento de ChatGPT: disrupción o pánico. Educ Knowl Soc. 2023;24: e31279. https://doi.org/10.14201/eks.31279 .

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor A, Chartash D. How does ChatGPT perform on the United States medical Licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9: e45312. https://doi.org/10.2196/45312 .

Hashana AJ, Brundha P, Ayoobkhan MUA, Fazila S. Deep Learning in ChatGPT—A Survey. In   2023 7th international conference on trends in electronics and informatics (ICOEI) . 2023. (pp. 1001–1005). IEEE. https://doi.org/10.1109/icoei56765.2023.10125852

Hirsto L, Saqr M, López-Pernas S, Valtonen T. (2022). A systematic narrative review of learning analytics research in K-12 and schools.  Proceedings . https://ceur-ws.org/Vol-3383/FLAIEC22_paper_9536.pdf

Hisan UK, Amri MM. ChatGPT and medical education: a double-edged sword. J Pedag Educ Sci. 2023;2(01):71–89. https://doi.org/10.13140/RG.2.2.31280.23043/1 .

Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023. https://doi.org/10.1093/jncics/pkad010 .

Househ M, AlSaad R, Alhuwail D, Ahmed A, Healy MG, Latifi S, Sheikh J. Large Language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9: e48291. https://doi.org/10.2196/48291 .

Ilkka T. The impact of artificial intelligence on learning, teaching, and education. Minist de Educ. 2018. https://doi.org/10.2760/12297 .

Iqbal N, Ahmed H, Azhar KA. Exploring teachers’ attitudes towards using CHATGPT. Globa J Manag Adm Sci. 2022;3(4):97–111. https://doi.org/10.46568/gjmas.v3i4.163 .

Irfan M, Murray L, Ali S. Integration of Artificial intelligence in academia: a case study of critical teaching and learning in Higher education. Globa Soc Sci Rev. 2023;8(1):352–64. https://doi.org/10.31703/gssr.2023(viii-i).32 .

Jeon JH, Lee S. Large language models in education: a focus on the complementary relationship between human teachers and ChatGPT. Educ Inf Technol. 2023. https://doi.org/10.1007/s10639-023-11834-1 .

Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT—Reshaping medical education and clinical management. Pak J Med Sci. 2023. https://doi.org/10.12669/pjms.39.2.7653 .

King MR. A conversation on artificial intelligence, Chatbots, and plagiarism in higher education. Cell Mol Bioeng. 2023;16(1):1–2. https://doi.org/10.1007/s12195-022-00754-8 .

Kooli C. Chatbots in education and research: a critical examination of ethical implications and solutions. Sustainability. 2023;15(7):5614. https://doi.org/10.3390/su15075614 .

Kuhail MA, Alturki N, Alramlawi S, Alhejori K. Interacting with educational chatbots: a systematic review. Educ Inf Technol. 2022;28(1):973–1018. https://doi.org/10.1007/s10639-022-11177-3 .

Lee H. The rise of ChatGPT: exploring its potential in medical education. Anat Sci Educ. 2023. https://doi.org/10.1002/ase.2270 .

Li L, Subbareddy R, Raghavendra CG. AI intelligence Chatbot to improve students learning in the higher education platform. J Interconnect Netw. 2022. https://doi.org/10.1142/s0219265921430325 .

Limna P. A Review of Artificial Intelligence (AI) in Education during the Digital Era. 2022. https://ssrn.com/abstract=4160798

Lo CK. What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci. 2023;13(4):410. https://doi.org/10.3390/educsci13040410 .

Luo W, He H, Liu J, Berson IR, Berson MJ, Zhou Y, Li H. Aladdin’s genie or pandora’s box For early childhood education? Experts chat on the roles, challenges, and developments of ChatGPT. Early Educ Dev. 2023. https://doi.org/10.1080/10409289.2023.2214181 .

Meyer JG, Urbanowicz RJ, Martin P, O’Connor K, Li R, Peng P, Moore JH. ChatGPT and large language models in academia: opportunities and challenges. Biodata Min. 2023. https://doi.org/10.1186/s13040-023-00339-9 .

Mhlanga D. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4354422 .

Neumann, M., Rauschenberger, M., & Schön, E. M. (2023). “We Need To Talk About ChatGPT”: The Future of AI and Higher Education.‏ https://doi.org/10.1109/seeng59157.2023.00010

Nolan B. Here are the schools and colleges that have banned the use of ChatGPT over plagiarism and misinformation fears. Business Insider . 2023. https://www.businessinsider.com

O’Leary DE. An analysis of three chatbots: BlenderBot, ChatGPT and LaMDA. Int J Intell Syst Account, Financ Manag. 2023;30(1):41–54. https://doi.org/10.1002/isaf.1531 .

Okoli C. A guide to conducting a standalone systematic literature review. Commun Assoc Inf Syst. 2015. https://doi.org/10.17705/1cais.03743 .

OpenAI. (2023). https://openai.com/blog/chatgpt

Perkins M. Academic integrity considerations of AI large language models in the post-pandemic era: ChatGPT and beyond. J Univ Teach Learn Pract. 2023. https://doi.org/10.53761/1.20.02.07 .

Plevris V, Papazafeiropoulos G, Rios AJ. Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard. arXiv (Cornell University) . 2023. https://doi.org/10.48550/arxiv.2305.18618

Rahman MM, Watanobe Y (2023) ChatGPT for education and research: opportunities, threats, and strategies. Appl Sci 13(9):5783. https://doi.org/10.3390/app13095783

Ram B, Verma P. Artificial intelligence AI-based Chatbot study of ChatGPT, google AI bard and baidu AI. World J Adv Eng Technol Sci. 2023;8(1):258–61. https://doi.org/10.30574/wjaets.2023.8.1.0045 .

Rasul T, Nair S, Kalendra D, Robin M, de Oliveira Santini F, Ladeira WJ, Heathcote L. The role of ChatGPT in higher education: benefits, challenges, and future research directions. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.29 .

Ratnam M, Sharm B, Tomer A. ChatGPT: educational artificial intelligence. Int J Adv Trends Comput Sci Eng. 2023;12(2):84–91. https://doi.org/10.30534/ijatcse/2023/091222023 .

Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys Syst. 2023;3:121–54. https://doi.org/10.1016/j.iotcps.2023.04.003 .

Roumeliotis KI, Tselikas ND. ChatGPT and Open-AI models: a preliminary review. Future Internet. 2023;15(6):192. https://doi.org/10.3390/fi15060192 .

Rudolph J, Tan S, Tan S. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. J Appl Learn Teach. 2023. https://doi.org/10.37074/jalt.2023.6.1.23 .

Ruiz LMS, Moll-López S, Nuñez-Pérez A, Moraño J, Vega-Fleitas E. ChatGPT challenges blended learning methodologies in engineering education: a case study in mathematics. Appl Sci. 2023;13(10):6039. https://doi.org/10.3390/app13106039 .

Sallam M, Salim NA, Barakat M, Al-Tammemi AB. ChatGPT applications in medical, dental, pharmacy, and public health education: a descriptive study highlighting the advantages and limitations. Narra J. 2023;3(1): e103. https://doi.org/10.52225/narra.v3i1.103 .

Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing? Crit Care. 2023. https://doi.org/10.1186/s13054-023-04380-2 .

Saqr M, López-Pernas S, Helske S, Hrastinski S. The longitudinal association between engagement and achievement varies by time, students’ profiles, and achievement state: a full program study. Comput Educ. 2023;199:104787. https://doi.org/10.1016/j.compedu.2023.104787 .

Saqr M, Matcha W, Uzir N, Jovanović J, Gašević D, López-Pernas S. Transferring effective learning strategies across learning contexts matters: a study in problem-based learning. Australas J Educ Technol. 2023;39(3):9.

Schöbel S, Schmitt A, Benner D, Saqr M, Janson A, Leimeister JM. Charting the evolution and future of conversational agents: a research agenda along five waves and new frontiers. Inf Syst Front. 2023. https://doi.org/10.1007/s10796-023-10375-9 .

Shoufan A. Exploring students’ perceptions of CHATGPT: thematic analysis and follow-up survey. IEEE Access. 2023. https://doi.org/10.1109/access.2023.3268224 .

Sonderegger S, Seufert S. Chatbot-mediated learning: conceptual framework for the design of Chatbot use cases in education. Gallen: Institute for Educational Management and Technologies, University of St; 2022. https://doi.org/10.5220/0010999200003182 .

Book   Google Scholar  

Strzelecki A. To use or not to use ChatGPT in higher education? A study of students’ acceptance and use of technology. Interact Learn Environ. 2023. https://doi.org/10.1080/10494820.2023.2209881 .

Su J, Yang W. Unlocking the power of ChatGPT: a framework for applying generative AI in education. ECNU Rev Educ. 2023. https://doi.org/10.1177/20965311231168423 .

Sullivan M, Kelly A, McLaughlan P. ChatGPT in higher education: Considerations for academic integrity and student learning. J ApplLearn Teach. 2023;6(1):1–10. https://doi.org/10.37074/jalt.2023.6.1.17 .

Szabo A. ChatGPT is a breakthrough in science and education but fails a test in sports and exercise psychology. Balt J Sport Health Sci. 2023;1(128):25–40. https://doi.org/10.33607/bjshs.v127i4.1233 .

Taecharungroj V. “What can ChatGPT do?” analyzing early reactions to the innovative AI chatbot on Twitter. Big Data Cognit Comput. 2023;7(1):35. https://doi.org/10.3390/bdcc7010035 .

Tam S, Said RB. User preferences for ChatGPT-powered conversational interfaces versus traditional methods. Biomed Eng Soc. 2023. https://doi.org/10.58496/mjcsc/2023/004 .

Tedre M, Kahila J, Vartiainen H. (2023). Exploration on how co-designing with AI facilitates critical evaluation of ethics of AI in craft education. In: Langran E, Christensen P, Sanson J (Eds).  Proceedings of Society for Information Technology and Teacher Education International Conference . 2023. pp. 2289–2296.

Tlili A, Shehata B, Adarkwah MA, Bozkurt A, Hickey DT, Huang R, Agyemang B. What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learn Environ. 2023. https://doi.org/10.1186/s40561-023-00237-x .

Uddin SMJ, Albert A, Ovid A, Alsharef A. Leveraging CHATGPT to aid construction hazard recognition and support safety education and training. Sustainability. 2023;15(9):7121. https://doi.org/10.3390/su15097121 .

Valtonen T, López-Pernas S, Saqr M, Vartiainen H, Sointu E, Tedre M. The nature and building blocks of educational technology research. Comput Hum Behav. 2022;128:107123. https://doi.org/10.1016/j.chb.2021.107123 .

Vartiainen H, Tedre M. Using artificial intelligence in craft education: crafting with text-to-image generative models. Digit Creat. 2023;34(1):1–21. https://doi.org/10.1080/14626268.2023.2174557 .

Ventayen RJM. OpenAI ChatGPT generated results: similarity index of artificial intelligence-based contents. Soc Sci Res Netw. 2023. https://doi.org/10.2139/ssrn.4332664 .

Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J. 2023. https://doi.org/10.1177/08465371231171125 .

Wardat Y, Tashtoush MA, AlAli R, Jarrah AM. ChatGPT: a revolutionary tool for teaching and learning mathematics. Eurasia J Math, Sci Technol Educ. 2023;19(7):em2286. https://doi.org/10.29333/ejmste/13272 .

Webster J, Watson RT. Analyzing the past to prepare for the future: writing a literature review. Manag Inf Syst Quart. 2002;26(2):3.

Xiao Y, Watson ME. Guidance on conducting a systematic literature review. J Plan Educ Res. 2017;39(1):93–112. https://doi.org/10.1177/0739456x17723971 .

Yan D. Impact of ChatGPT on learners in a L2 writing practicum: an exploratory investigation. Educ Inf Technol. 2023. https://doi.org/10.1007/s10639-023-11742-4 .

Yu H. Reflection on whether Chat GPT should be banned by academia from the perspective of education and teaching. Front Psychol. 2023;14:1181712. https://doi.org/10.3389/fpsyg.2023.1181712 .

Zhu C, Sun M, Luo J, Li T, Wang M. How to harness the potential of ChatGPT in education? Knowl Manag ELearn. 2023;15(2):133–52. https://doi.org/10.34105/j.kmel.2023.15.008 .

Download references

The paper is co-funded by the Academy of Finland (Suomen Akatemia) Research Council for Natural Sciences and Engineering for the project Towards precision education: Idiographic learning analytics (TOPEILA), Decision Number 350560.

Author information

Authors and affiliations.

School of Computing, University of Eastern Finland, 80100, Joensuu, Finland

Yazid Albadarin, Mohammed Saqr, Nicolas Pope & Markku Tukiainen

You can also search for this author in PubMed   Google Scholar

Contributions

YA contributed to the literature search, data analysis, discussion, and conclusion. Additionally, YA contributed to the manuscript’s writing, editing, and finalization. MS contributed to the study’s design, conceptualization, acquisition of funding, project administration, allocation of resources, supervision, validation, literature search, and analysis of results. Furthermore, MS contributed to the manuscript's writing, revising, and approving it in its finalized state. NP contributed to the results, and discussions, and provided supervision. NP also contributed to the writing process, revisions, and the final approval of the manuscript in its finalized state. MT contributed to the study's conceptualization, resource management, supervision, writing, revising the manuscript, and approving it.

Corresponding author

Correspondence to Yazid Albadarin .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Table  4

The process of synthesizing the data presented in Table  4 involved identifying the relevant studies through a search process of databases (ERIC, Scopus, Web of Knowledge, Dimensions.ai, and lens.org) using specific keywords "ChatGPT" and "education". Following this, inclusion/exclusion criteria were applied, and data extraction was performed using Creswell's [ 15 ] coding techniques to capture key information and identify common themes across the included studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Albadarin, Y., Saqr, M., Pope, N. et al. A systematic literature review of empirical research on ChatGPT in education. Discov Educ 3 , 60 (2024). https://doi.org/10.1007/s44217-024-00138-2

Download citation

Received : 22 October 2023

Accepted : 10 May 2024

Published : 26 May 2024

DOI : https://doi.org/10.1007/s44217-024-00138-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Large language models
  • Educational technology
  • Systematic review

Advertisement

  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 04 June 2024

Sharpening the lens to evaluate interprofessional education and interprofessional collaboration by improving the conceptual framework: a critical discussion

  • Florian B. Neubauer 1 ,
  • Felicitas L. Wagner 1 ,
  • Andrea Lörwald 1 &
  • Sören Huwendiek 1  

BMC Medical Education volume  24 , Article number:  615 ( 2024 ) Cite this article

Metrics details

It has been difficult to demonstrate that interprofessional education (IPE) and interprofessional collaboration (IPC) have positive effects on patient care quality, cost effectiveness of patient care, and healthcare provider satisfaction. Here we propose a detailed explanation for this difficulty based on an adjusted theory about cause and effect in the field of IPE and IPC by asking: 1) What are the critical weaknesses of the causal models predominantly used which link IPE with IPC, and IPE and IPC with final outcomes? 2) What would a more precise causal model look like? 3) Can the proposed novel model help us better understand the challenges of IPE and IPC outcome evaluations? In the format of a critical theoretical discussion, based on a critical appraisal of the literature, we first reason that a monocausal, IPE-biased view on IPC and IPC outcomes does not form a sufficient foundation for proper IPE and IPC outcome evaluations; rather, interprofessional organization (IPO) has to be considered an additional necessary cause for IPC; and factors outside of IPC additional causes for final outcomes. Second, we present an adjusted model representing the “multi-stage multi-causality” of patient, healthcare provider, and system outcomes. Third, we demonstrate the model’s explanatory power by employing it to deduce why misuse of the modified Kirkpatrick classification as a causal model in IPE and IPC outcome evaluations might have led to inconclusive results in the past. We conclude by applying the derived theoretical clarification to formulate recommendations for enhancing future evaluations of IPE, IPO, and IPC. Our main recommendations: 1) Focus should be placed on a comprehensive evaluation of factual IPC as the fundamental metric and 2) A step-by-step approach should be used that separates the outcome evaluation of IPE from that of IPC in the overarching quest for proving the benefits of IPE, IPO and IPC for patients, healthcare providers, and health systems. With this critical discussion we hope to enable more effective evaluations of IPE, IPO and IPC in the future.

Peer Review reports

There is scant knowledge on the extent to which the quality of interprofessional education (IPE) and interprofessional collaboration (IPC) at healthcare institutions influences the patient care quality [ 1 , 2 , 3 ], the cost effectiveness of patient care, the job satisfaction of healthcare professionals [ 1 ] and, as a result, their retention [ 4 , 5 ]. Patients, people who organize and finance healthcare, policy makers, tax payers, and arguably societies as a whole have a reasonable interest in an answer to this question.

According to the peer-reviewed literature, relevant knowledge gaps persist about the benefits of IPE and IPC despite multiple studies on IPE and IPC outcomes covering a period of almost 50 years [ 2 , 3 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ]. Several explanations as to how this can be possible are proposed: The number of evaluation studies is still too low [ 10 ]; the time periods typically covered by evaluations is too short to detect final outcomes of IPE/IPC interventions [ 2 , 8 , 11 , 14 , 15 ]; too much focus is placed on immediate results without including measures for final outcomes from the outset [ 10 ]; or, ultimately, positive effects of IPE and IPC simply might not exist [ 6 , 9 , 10 ]. Another frequent and non-contradictory explanation proposes that a lack of clarity in theory and terminology of IPE and IPC and an insufficient use of conceptual frameworks are major deficits which obscure evaluation results [ 8 , 12 , 13 , 16 , 17 , 18 , 19 , 20 , 21 ].

In this article, we argue the latter: That an insufficient use of conceptual frameworks has obscured evaluation results. We propose that the persistence of the knowledge gap relating to patient outcomes, satisfaction of healthcare professionals, and cost effectiveness of IPE and IPC activities (briefly, “patient, healthcare provider, and system outcomes”) is rooted in a lack of accuracy in the theoretical models used for mapping causes and effects in IPE and IPC. Our objective is to contribute to overcoming the inconclusiveness in IPE and IPC outcome evaluations by achieving the missing accuracy through the lens of a novel “multi-stage multi-causality” model. Specifically, our research questions are: 1) What are the critical weaknesses of the causal models predominantly used which link IPE with IPC, and IPE and IPC with final outcomes? 2) What would a more precise causal model look like? 3) Can the proposed novel model help us better understand the challenges of IPE and IPC outcome evaluations?

In answering these questions, we first show evidence from the literature that the existing causal models of IPE and IPC exhibit a crucial imprecision. Second, we present the “multi-stage multi-causality model of patient, healthcare provider, and system outcomes” which fixes this imprecision by making a small but important modification to the causal role of IPO. Third, we demonstrate the explanatory power of the multi-stage multi-causality model showing why evaluations using the modified Kirkpatrick classification of interprofessional outcomes (MKC) [ 11 , 22 , 23 ] — a tool commonly used to evaluate outcomes of IPE activities — have failed to substantiate positive outcomes of IPE and IPC; namely, we show how the misuse of MKC leads to inconclusiveness and difficulties in evaluating final patient, healthcare provider, and system outcomes. We conclude with recommendations for future evaluations in the field of IPE, IPO and IPC.

With this theoretical investigation, we hope to contribute to a deeper understanding of the causal factors in IPE, IPO and IPC and to enable more precise evaluations in the future.

Based on our research questions, we performed iterative literature searches (detailed below) followed by critical appraisal by the authors, and transformed the resulting insights into the critical discussion presented in the main section of the present article by applying the 6 quality criteria of the SANRA scale [ 24 ]:

Justification of the article's importance for the readership: Our target audience consists of researchers whose goal is to evaluate whether IPE, IPO or IPC improve patient, healthcare provider, and system outcomes. For our target audience the present study is meaningful because it advances the understanding of the theoretical foundations of evaluations in this field. Further, in local contexts where the potential of IPE, IPO, and IPC is still neglected, clear evidence demonstrating substantial benefits would help to foster programs aimed at implementing better IPE, IPO, or IPC.

Statement of concrete/specific aims or formulation of questions : We set out to explore the following questions: 1) What are the critical weaknesses of the causal models predominantly used which link IPE with IPC, and IPE and IPC with final outcomes? 2) What would a more precise causal model look like? 3) Can the proposed novel model help us better understand the challenges of IPE and IPC outcome evaluations?

Description of literature searches: We searched for existing definitions, causal models, relevant indicators, and evaluation instruments for IPE, IPO, and IPC using PubMed, Google and Google Scholar with the following search terms in different combinations: “interprofessional education”, “interprofessional collaboration”, “interprofessional organization”, “interprofessional team work”, “evaluation”, “outcome evaluation”, “process evaluation”, “modified Kirkpatrick”, “conceptual framework”, “theory”, “model”, “instrument”, “assessment scale”, “survey”, “review”. We conducted all searches in English, covering the time period from 1950 to 2023. We augmented the initial body of literature found by this strategy with citation tracking: for backward tracking, we followed the references provided in articles which we deemed relevant for our research questions; for forward searches, we used the "cited by" feature of PubMed and Google Scholar. The subchapter-specific literature search used in the development of our definition of IPC is described under “Definition of factual IPC”.

Referencing: We consistently back key statements by references.

Scientific reasoni ng: We enable the reader to easily follow our narrative by structuring the present article around the three research questions as stated above, following a logical flow of arguments.

Appropriate presentation of data: We present the data by distinguishing which findings were taken from the literature and which novel arguments for answering the research questions were derived by us.

Definitions

Definition of ipe.

Occasions when two or more healthcare/social care professions learn with, from and about each other to improve collaboration and the quality of care for patients/clients [ 2 ] (slightly refining the CAIPE definition [ 25 ]).

These occasions can happen formally or informally, in dedicated educational settings or at the workplace of healthcare/social care professions, and at any stage along the learning continuum, i.e. foundational education, graduate education, and post-licensure continuing professional development [ 8 , 26 ]. The central concept in IPE is learning [ 13 ], the gain of knowledge, skills, and attitudes, or — from a constructivist’s perspective — changes in the brains of individuals.

Definition of factual IPC

Presence of activities in the following 7 dimensions:

Patient-centered care, including a shared treatment plan and effective error management;

Shared creation of the treatment plan and coordination of its execution;

Mutual respect between professions;

Communication, including shared decision-making, sharing of information, appropriate communication tools, and accessibility of team members;

Shared definition and acceptance of roles and responsibilities;

Effective conflict management; and

Leadership, including outcome orientation.

How did we arrive at this definition? IPC has to be distinguished from traditional “multiprofessional collaboration”. In multiprofessional collaboration, patient care is organized in a discipline-oriented way, affecting its organization, leadership, communication, and decision-making. Different professions work separately, each with their own treatment goals; the physician delegates treatment options to the other healthcare professionals in one-way, mostly bilateral communication [ 27 , 28 ]. IPC, in contrast, is defined as the occasions “when multiple health workers from different professional backgrounds work together with patients, families, carers and communities to deliver the highest quality of care” [ 29 ]. This definition by the WHO remains in use today [ 30 ]. However, we found that, in order to talk about specific effects of IPE on IPC and to tailor evaluations towards less ambiguous results, an operationalized definition of IPC is required which provides a higher level of applicability. To create such a definition, we searched the literature to collect a comprehensive list of IPC dimensions which covers all possible settings of IPC. In an iterative process of content-based thematic clustering, reviews, original articles and preexisting questionnaires on the evaluation of IPE and IPC were added until there was agreement between the authors that saturation was reached with regard to all relevant IPC dimensions. This resulted in the following list of publications: [ 3 , 7 , 9 , 19 , 26 , 28 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 ]. Next, we clustered the terms for IPC dimensions found in this body of literature by consensus agreement on sufficient equivalence between three of the authors (FBN, FLW, SH). Clustering was required due to a lack of consistent terminology in the literature and resulted in the comprehensive set of 7 IPC dimensions used in our definition of IPC provided above. Finally, we needed to differentiate IPC from IPE and learning: At the workplace, informal learning happens all the time. As a result, interprofessional work processes can comprise both IPC and IPE at the same time; however, interprofessional learning is only a possible , not a necessary element of IPC and hence was not included in our definition of IPC. For example, a healthcare professional who is fully equipped with all competencies required for factual IPC could proficiently work in an established team in an interprofessional way without having to learn any additional IPC-related skills.

In order to stress that our definition of IPC includes all the healthcare-related interprofessional work processes actually taking place but excludes the activities required to create them (those fall in the domains of IPE or IPO), we use the term “factual IPC” throughout the present article. Factual IPC not only happens in formal interprofessional work processes like regular, scheduled meetings but also “on the fly”, i.e. during informal and low-threshold communication and collaboration.

Definition of IPO

All activities at a healthcare institution which create, improve, or maintain regular work processes of factual IPC or create, improve, or maintain institutional conditions supporting formal and informal parts of factual IPC, but excluding activities related to IPE.

There is no agreed upon definition of IPO in the literature, so we propose this refined one here that is broad enough to encompass the full variety of IPC-supporting activities at a healthcare institution while, at the same time, being narrow enough to exclude all manifestations of IPE.

According to this definition, IPO complements IPE within the set of jointly sufficient causes of factual IPC. IPO comprises all conditions required for the realization of factual IPC which are not related to interprofessional learning. It includes the actions of healthcare managers to implement work processes for IPC and to create supportive conditions for IPC (cf. the definitions of IPO in [ 6 , 8 , 13 , 17 , 23 , 26 , 30 , 31 , 33 , 40 , 41 , 42 ]). All interventions which establish or improve interprofessional work processes, i.e. which change how things are done in patient care, or which improve the conditions for factual IPC at an institution, belong in the domain of IPO. IPO is also the continued support for factual IPC by management like encouragement, clarification of areas of responsibility, incentives, staffing, room allocation, other resources, or funding. In contrast, established and regular interprofessional tasks themselves, after they have become part of the day-to-day work life of healthcare teams, without requiring further actions by management, would be categorized as factual IPC, not IPO.

Taken together, IPE is the umbrella term for planning, organizing, conducting, being subject to, and the results of interprofessional learning activities, whereas IPO is the umbrella term for all other activities that, in addition to individual competencies of team members, are necessary to cause factual IPC of high quality.

Critical discussion

What are the critical weaknesses of the causal models predominantly used which link ipe with ipc, and ipe and ipc with final outcomes.

We start by exploring the models in the literature that describe causes and effects of interprofessional activities in the context of patient care. We will derive evidence from the literature that the existing models exhibit a crucial imprecision regarding the causal role of IPO.

The causal model of IPC proposed by the WHO [ 29 , 43 ] (Fig.  1 ) was the model predominantly used in past evaluations of IPE and IPC. The WHO model suggests that IPE-related learning leads to IPC competence (knowledge, skills, and attitudes) in the “health workforce” that is “IPC-ready” post-IPE. This readiness “automatically” leads (as the long diagonal arrow in Fig.  1 suggests) to factual IPC. “The World Health Organization and its partners acknowledge that there is sufficient evidence to indicate that effective interprofessional education enables effective collaborative practice” [ 29 ]. Factual IPC, in turn “strengthens health systems and improves health outcomes” [ 29 ]. As a result, this model suggests a kind of “transitivity” between first causes and last effects: effective IPE activities are expected to ultimately yield positive patient, healthcare provider, and system outcomes on their own.

figure 1

The WHO model of causes and effects in IPE and IPC (from [ 29 ], with permission)

After its publication, the WHO model was regularly cited and endorsed by IPE experts and continues to exert broad influence today. As of October 6, 2023, the “Google Scholar” search engine showed the original publication [ 29 ] to have 4393 citations, 367 of them in 2023 alone.

It is important to note that the WHO model is monocausal with respect to IPC, i.e. IPE is the sole necessary cause for factual IPC. While the model acknowledges that, next to IPE, there are further “mechanisms that shape how collaborative practice is introduced and executed”, it only ranks them as supportive: “ Once a collaborative practice-ready health workforce is in place [emphasis added], these [additional] mechanisms will help them [policy-makers] determine the actions they might take to support [emphasis added] collaborative practice” [ 29 ]. The following quotes by Reeves and colleagues further illustrate the strong emphasis causal models used to put on IPE: «It is commonly argued that IPE can promote the skills and behaviours required for effective IPC, which in turn can improve quality of health care and patient outcomes” [ 17 ] and “National organisations have created core competencies for interprofessional collaborative practice, positioning IPE as fundamental to practice improvement [emphasis added]” [ 10 ]. A couple of years later, Paradis and colleagues even state: “During this wave [of IPE; 1999–2015], advocates suggested IPE as the solution to nearly every health care problem that arose (…)” [ 6 ].

However, a scoping review by Reeves and colleagues aimed at improving “conceptualization of the interprofessional field” published soon after the WHO model, already acknowledged that the monocausal picture of factual IPC is incomplete [ 17 ]. Based on a broad analysis of the literature, their review offers a theoretical “Interprofessional framework” that includes the notion of IPO as an additional and different possible cause for desired interprofessional outcomes (Fig.  2 ). They define IPO interventions as “changes at the organizational level (e.g. space, staffing, policy) to enhance collaboration and the quality of care”. The explicit inclusion of IPO in this causal model of IPC was a very important step forward. The authors position IPO interventions parallel to IPE interventions, clearly indicating that IPO is an additional possible cause for desired interprofessional objectives and outcomes. However, in their framework, the capacity of IPO to be a second necessary cause in addition to IPE had not been clearly worked out yet.

figure 2

The Interprofessional Framework (from [ 17 ], reprinted by permission of the publisher (Taylor & Francis Ltd, [ 44 ]). Note that, next to IPE, IPO is listed as a different, additional cause for desired interprofessional objectives and outcomes, but the crucial concept that it also is a necessary cause has not yet been worked out here

Side note: This model and publications using it (e.g. [ 45 ]) specify “Interprofessional Practice” (IPP) as a fourth domain, different from IPO (Fig.  2 , middle column). However, the IPP elements describe interventions that support work processes of factual IPC, and support for work processes of factual IPC is fully included in our definition of IPO. As a result, we see no necessity to set IPP apart from IPO and do not include IPP as an additional domain in our model below.

For completeness’ sake, we want to mention another explicit model by D’amour and Oandasan [ 26 ] with a comparable level of causal clarity which similarly claims that “there are many factors that act as determinants for collaborative practice to be realized”. As this model does not alter our line of argument it is not shown here.

The ongoing imprecision about the causal role of IPO naturally led to the next iteration of models. The authors of a 2015 review, commissioned by the Institute of Medicine of the National Academy of Sciences (IOM), provide the most recent influential model of causes and effects in IPC which they call “Interprofessional learning continuum model” [ 8 ] (Fig.  3 ).

figure 3

The Interprofessional learning continuum model (from [ 8 ], with permission). Under the labels of “Institutional culture”, “Workforce policy”, and “Financing policy” it not only comprises IPO but assigns to IPO the crucial property of being an “enabling” factor, i.e. being co-causal for factual IPC (here labeled as “Collaborative behavior” and “Performance in practice”, lower left row). Despite this important improvement, the hierarchy of causes and effects remains partially vague: a The green arrow seems to imply direct effects of IPO on health and system outcomes without acknowledging that if IPO is supposed to have an effect on those at all, it necessarily must improve factual IPC first. b The impression remains that factual IPC mainly belongs on the left-hand side, being primarily an effect of IPE. IPO seems less effective on IPC, depending on how one interprets the influence of the green arrow on the larger red box which groups learner, health and system outcomes. c The left tip of the red double arrow in the center, indicating an effect of health and system outcomes on learning outcomes, is not discussed in the publication

In comparison to the WHO model (Fig.  1 ) and the Interprofessional Framework (Fig.  2 ), this causal model acknowledges that IPO is not just an additional but also a necessary cause of IPC and thus provides the most elaborate description of the causal relationships between IPE, IPO and IPC in the literature so far. The authors state, “Diverse and often opaque payment structures and differences in professional and organizational cultures generate obstacles to innovative workforce arrangements, thereby impeding interprofessional work. On the other hand, positive changes in workforce and financing policies could enable [emphasis added] more effective collaboration (…)” [ 8 ]. The word “enable” implies causal necessity : if an enabling factor is absent, the effect is disabled, hence the enabling factor is necessary . The key insight that IPO is a further necessary cause of IPC next to IPE can be found in several other, partly less recent publications, with the only difference that these publications do not embed this insight in a formal model [ 6 , 7 , 13 , 23 , 31 , 33 , 41 , 42 ]. The causal necessity of IPO becomes evident if one considers the extreme case: imagine a healthcare team whose individual members have all learned through IPE the skill set necessary for high quality IPC, i.e. they are optimally trained for IPC. However, they work at an institution that does not support proper IPC work processes, e.g. there is no dedicated time for team discussions of treatment plans and no electronic tools that allow all team members equal access to patient data. Consequently, there effectively cannot be an optimal manifestation of factual IPC, and it is impossible to expect that the IPE that the team members experienced during their training will significantly affect the quality of patient care in this setting.

What would a more precise causal model look like?

As we have seen, the notion of IPO in causal models of interprofessionality in the literature progressed from “IPO supportive” (Fig.  1 ) to “IPO possible but optional” (Fig.  2 ) to “IPO enabling, i.e. necessary” (Fig.  3 ). The key result of our study is a refinement missing from the existing causal model of IPE/IPO/IPC. It is the explicit statement that IPO is an equally necessary factor next to IPE in the causation of factual IPC. Only jointly are IPE and IPO sufficient to cause factual IPC of high quality. We deem this small modification crucial to reach the conceptual resolution required to fully understand the causes of factual IPC. The fully adjusted causal model is presented in Fig.  4 . In this “multi-stage multi-causality model of patient, healthcare provider, and system outcomes”, IPO is now unequivocally labeled as co-necessary for factual IPC alongside IPE-caused individual competencies.

figure 4

Multi-stage multi-causality model of patient, healthcare provider, and system outcomes. Key ideas: IPO is an equally necessary co-factor in the causation of high-quality factual IPC, in addition to IPE. And the entire realm of interprofessional activities (red-outlined box), of which factual IPC is the final and active ingredient, is in turn only one of several causes leading to final outcomes of interest. Orange boxes: Domain of IPE, the domain of acquisition of competencies for IPC by an individual person through learning. Blue boxes: Domain of IPO, defined as the institutional domain of implementation, improvement, and maintenance of work processes of factual IPC and of IPC-supportive institutional conditions. Green box: Domain of factual IPC at a healthcare institution. Green-gray box, bottom row: Final outcomes of interest, i.e. patient care quality, job satisfaction of healthcare professionals, and cost effectiveness of patient care

Much more explicitly than previous ones, the multi-stage multi-causality model further shows that there are additional necessary causes for beneficial patient, healthcare provider, and system outcomes that lie entirely outside of the realm of IPC-related activities (i.e. outside of IPE/IPO/IPC). It is important to understand that not only factual IPC, but also the final patient, healthcare provider, and system outcomes have more than only one necessary cause, as reflected in the concept of “multi-causality on multiple stages”. This means that optimizing factual IPC is necessary but still not sufficient to optimize patient, healthcare provider, and system outcomes. Examples for necessary co-factors on the same level as factual IPC but from outside the realm of IPE/IPO/IPC are a) profession-specific (“uniprofessional”) competencies for aspects of a task that can only be accomplished by members of a specific healthcare profession ( task work vs. team work in [ 41 ]), b) details of health insurance policies, which can affect the cost effectiveness of patient care [ 46 ], salaries paid to health professionals by a healthcare institution, a factor which can influence job satisfaction [ 47 ], or good management decisions at an institution of patient care in general which comprise much more than just full support for factual IPC [ 46 ].

It should be noted that the co-causality in this conceptual framework is not compatible with the transitivity of the WHO model, where IPE ultimately leads to patient and healthcare provider outcomes via a predefined chain of “self-sustaining” secondary effects.

In sum, the adjusted causal model proposes that patient, healthcare provider, and system outcomes depend on multi-stage multi-causality. Stage 1: IPE + IPO = factual IPC: competencies for IPC in the workforce, the final result of interprofessional learning (IPE), plus creating and maintaining IPC work processes and supportive institutional conditions (IPO) together cause factual IPC. Stage 2: Factual IPC + non-interprofessional factors = patient, healthcare provider, and system outcomes: Factual IPC of high quality plus additional necessary but interprofessionality-independent factors together cause the final outcomes of interest.

The intention of our notion of “multi-stage multi-causality” is not to devalue the arrow-less “causal halos” of contextual factors in other models but rather to emphasize that even in “complex” systems (systems with multiple interacting elements) the actual sequence of causes and effects should be understood as precisely as possible for optimizing evaluations.

Brandt and colleagues, after reviewing the impact of IPE and IPC, note in their outlook on IPE, “given the complexity of the healthcare world, training learners in effective team work may not ultimately lead to improved health outcomes or reduce the cost of care” [ 9 ]. We don’t share this degree of pessimism; above we have shown that a monocausal, IPE-biased view on IPC simply might be insufficient for proper outcome evaluation of IPE and IPC. There is hope that by considering IPO, evaluations will become more conclusive. Wei and colleagues state in a systematic meta-review of systematic reviews about IPC, “Effective IPC is not linear; it does not occur naturally when people come together but takes a whole system’s efforts, including organizations, teams, and individuals” [ 30 ]. As we have explained, IPO has to be factored in as an additional necessary cause for IPC, and factors from outside the realm of IPE/IPO/IPC contribute to the “hard” outcomes of interest as well. We presented an adjusted causal model which explicitly acknowledges this multi-stage multi-causality of patient, healthcare provider, and system outcomes.

Can the proposed novel model help us better understand the challenges of IPE and IPC outcome evaluations?

We claim that the multi-stage multi-causality model exhibits strong explanatory power with regards to the difficulties of showing positive consequences of IPE and IPC in outcome evaluations in the past. To illustrate this, we must first describe the prominent role the modified Kirkpatrick classification of interprofessional outcomes [ 11 , 22 , 23 ] plays in outcome evaluations of IPE and IPC.

The modified Kirkpatrick classification (MKC)

MKC is regularly used to classify outcomes of IPE learning activities, curricula and programs [ 2 , 8 , 14 , 20 , 42 , 45 , 48 , 49 ]. It is a derivative of the original Kirkpatrick model for evaluating training results, named after its author, Donald L. Kirkpatrick, which distinguishes four categories of learning outcomes (Level 1: Reaction, Level 2: Learning, Level 3: Behavior, Level 4: Results) [ 50 , 51 ]. Expanding the original model, MKC assigns outcomes of IPE activities to six categories [ 11 ]:

Level 1: Reaction

Level 2a: Modification of perceptions & attitudes

Level 2b: Acquisition of knowledge & skills

Level 3: Behavioural change

Level 4a: Change in organisational practice (wider changes in the organization and delivery of care)

Level 4b: Benefits to patients/clients

In 2007, the authors of MKC claimed, “We have used these categories since 2000. They have proved useful and, contrary to our initial expectations, sufficient to encompass all outcomes in the hundreds of studies reviewed to date” [ 11 ]. This completeness has made MKC a useful tool for authors of review articles as it allows a retrospective classification of IPE outcomes not labeled in the original literature. As a result, MKC was quickly adopted by IPE evaluators around the world to describe the effectiveness of IPE interventions. As Thistlethwaite and colleagues put it in 2015, “This (…) model is now ubiquitous for health professional education evaluation” [ 42 ].

At first glance, the existence of such a clear and simple classification of IPE outcomes which not only covers all possible IPE outcomes but also is widely embraced in the literature, seems to be good news. What exactly is the problem then? Why did the introduction of MKC more than twenty years ago, plus the conceptual clarification provided by it, not resolve the difficulty in demonstrating IPE-caused patient, healthcare provider, and system outcomes (i.e. effects on MKC levels 4a and 4b)? In the following, we unfold a detailed answer to this question after application of the multi-stage multi-causality model.

To achieve progress, IPE and IPC outcome evaluations need to be complemented with process evaluations

MKC classifies outcomes but is agnostic about how these outcomes come into existence. For an evaluator using MKC, the effects of IPE-related interventions unfold inside a black box. The input into the black box is the intervention, the output constitutes 6 different classes of outcomes, i.e. the 6 levels of MKC described above. Naturally, such solely outcome-focused evaluations cannot explain functional interdependencies between the elements of the system. As we have seen, the benefits of IPE and IPC do not unfold as trivially as initially thought. Therefore, after two decades of (overall rather) inconclusive results of applying MKC to the outcomes of interprofessional interventions, the “why” should have moved to the center of the IPE evaluation efforts. This question is posed variously under well-known labels: Authors aware of said stagnancy either call for “formative evaluation” [ 52 ], “process evaluation” [ 14 ], or “realist evaluation” [ 42 ] in order to understand why interventions work as intended or not. In the following, we use the term “process evaluation” because we focus on understanding the underlying mechanisms.

Process evaluations require a causal model

Process evaluations require a causal model for the system under study to be able to select relevant indicators from a potentially much larger number of conceivable indicators. Appropriately selected indicators, which reflect the inner mechanisms of the system, then replace the black box, reveal bottlenecks, and allow explanations as to why interventions did or did not have the expected or intended outcomes. To explicitly demand the use of a causal model in an evaluation is a core principle, for example, of the “realistic evaluation” approach [ 53 ]. By directly criticizing the (original) Kirkpatrick model, Holton similarly suggests that a “researchable evaluation model” is needed which should “account for the effects of intervening variables that affect outcomes, and indicate causal relationships” [ 54 ]. Specifically for the domain of IPE and IPC, Reeves and colleagues [ 20 ] recommend “the use of models which adopt a comprehensive approach to evaluation” and the IOM authors conclude, “Having a comprehensive conceptual model provides a taxonomy and framework for discussion of the evidence linking IPE with learning, health, and system outcomes. Without such a model, evaluating the impact of IPE on the health of patients and populations and on health system structure and function is difficult and perhaps impossible [emphasis added]” [ 8 ].

MKC is not a causal model

Aliger and Yanak note that when Donald Kirkpatrick first proposed his model, he did not assert that each level is caused by the previous level [ 55 ]. Similarly, the developers of MKC acknowledge that “Kirkpatrick did not see outcomes in these four areas as hierarchical.” Rather, most likely in an attempt to avoid indicating causality in MKC themselves, they talk about “categories” not “levels” throughout the majority of their abovementioned paper [ 11 ]. They even knew from the outset that besides IPE the domain which we now call IPO influences outcomes on MKC levels 4a and 4b (but did not include IPO in MKC): “(…) impact of one professional’s changes in behavior depend[s] on [a] number of organisational constraints such as individual’s freedom of action (…) and support for innovation within the organisation” [ 13 ]. This means that by design neither the original Kirkpatrick model nor MKC are intended to be or to include causal models. MKC simply doesn’t ask at all whether additional causes besides an IPE intervention might be required for creating the outcomes it classifies, especially those of levels 3, 4a and 4b. In case such additional causes exist, MKC neither detects nor reflects them. Yardley and Dornan conclude that Kirkpatrick’s levels “are unsuitable for (…) education interventions (…) in which process evaluation is as important as (perhaps even more important than) outcome evaluation” [ 14 ].

Nevertheless, MKC continues to be misunderstood as implying a causal model

The numbered levels in the original Kirkpatrick model have drawn criticism for implying causality [ 14 , 54 , 55 ]. Originally, Kirkpatrick had used the term “steps” not “levels” [ 15 , 42 , 55 ] whereas all current versions of the Kirkpatrick model, including MKC, now use the term “levels”. Bates [ 52 ] cites evidence that Kirkpatrick himself, in his later publications, started to imply causal relationships between the levels of his model. Bates bluntly declares: “Kirkpatrick’s model assumes that the levels of criteria represent a causal chain such that positive reactions lead to greater learning, which produces greater transfer and subsequently more positive organizational results” [ 52 ]. Alliger and Janak [ 55 ] provide other examples from the secondary literature which explicitly assume direct causal links between the levels and continue to show that this assumption is highly problematic. Most strikingly, the current (2023) version of the Kirkpatrick model [ 51 ], created by Donald Kirkpatrick’s successors, explicitly contains a causal model which uses the exact same causal logic Alliger and Janak had proposed as underlying it almost 3 decades earlier [ 55 ].

As a derivative of the Kirkpatrick model, MKC has inherited just that unfortunate property of implying causality between levels. While starting their above-mentioned publication with the carefully chosen term “categories”, the authors of MKC, in the same publication, later fall back on using “levels” [ 11 ]. In earlier publications, they even had explicitly assigned explanatory causal power to MKC: “Level 4b: Benefits to patients/clients. This final level covers any improvements in the health and well being of patients/clients as a direct result [emphasis added] of an education programme” [ 22 ]. Taken together, the authors of MKC themselves, while acknowledging that the original Kirkpatrick model didn’t imply a causal hierarchy, at times contradictorily fuel the notion that MKC provides a viable causal model for the mechanisms of IPE and IPC. As Roland observes, it became common in the literature in general to see the levels of MKC as building on each other, implying a linear causal chain from interprofessional learning to collaborative behavior to patient outcomes [ 15 ].

Why has the wrong attribution of being a causal model to MKC remained stable for so long?

Why has this misunderstanding of MKC as a causal model not drawn more criticism and why has it been so stable? We speculate that a formal parallelism between the transitive relations in the WHO causal model (Fig.  1 ) and the numbered levels of MKC, if wrongly understood as a linear chain of subsequent causes and effects, strengthens the erroneous attribution of a causal model to MKC (Fig.  5 ). Our reasoning: The continued use of the mono-causal WHO model, as opposed to switching to a model incorporating multiple causes for patient, healthcare provider, and system outcomes, stabilizes the misunderstanding of the monothematic (IPE-constricted) MKC as a causal model. (In defense of this mistake, one could say, if the transitivity assumption associated with the WHO causal model was true, i.e. if the causal chain actually was mono-linear, then MKC would be a valid causal model because intermediate outcomes would be the sole causes of subsequent outcomes, covering the entire, linear chain of causes. As a result, there would be no difference between outcome evaluation and process evaluation, and MKC would be an appropriate tool for process evaluations.) Conversely, we suspect that the wrong but established use of MKC as a conceptual framework in IPE and IPC outcome evaluations stabilizes the continued use of the mono-causal linear WHO model, reinforcing the wrong impression that IPE is the only cause of interprofessional outcomes. The “transitivity” of the WHO model strongly resonates with the observation that the (original) Kirkpatrick model implies the assumption that “all correlations among levels are positive” [ 55 ]. If the most upstream event (an IPE activity) is positively correlated with the most downstream elements (patient, healthcare provider, and system outcomes) anyway, why should one bother evaluating intermediate steps? The same fallacy holds true for MKC. When its authors state that “Level 4b (…) covers any improvements in the health and well being of patients/clients as a direct result of an education programme” [ 22 ], they not only assign causal explanatory power to MKC, but also neglect the “multi-causality on multiple stages” of outcomes. They assume the same causal transitivity for MKC as is present in the WHO model and thereby expect an “automatic” tertiary effect from an IPE intervention on patient outcomes without considering at all whether the quality of factual IPC – as a necessary intermediate link in the causal chain – has changed due to the intervention or not.

figure 5

“Unhealthful alliance” between the WHO causal model and MKC. MKC as an outcome classification does not contain a causal model, but uses the term “level” and has numbers attached to each, suggesting causal hierarchy nonetheless. The “levels” of MKC resonate with the causal chain of the WHO model. We speculate that this formal similarity stabilizes the false assignment of a causal structure to MKC (red arrows in the lower row) and, at the same time, as MKC is widely used, perpetuates the use of the WHO model

If misused as a causal model, MKC does not function and can hinder progress in IPE and IPC evaluations

So far we have established that a) Pure outcome evaluations do not answer the question why it is so hard to detect patient, healthcare provider, and system outcomes of IPE and IPC interventions; b) Process evaluations are required to address this “why” question and to achieve progress in IPE and IPC evaluations; c) A theoretical causal model is required for such process evaluations; d) MKC is not such a causal model; e) Nevertheless, MKC falsely keeps being used as such a causal model; and f) The misuse of MKC has remained rather stable, possibly due to a formal parallelism between the WHO causal model and MKC.

The multi-staged multi-causality model of patient, healthcare provider, and system outcomes now makes it clear why evaluations which implicitly or explicitly treat MKC as a causal model are bound to fail in their process evaluation part: MKC, when used as a causal model, is crucially incomplete: In terms of the causes of factual IPC (cf. Figure  4 , orange and blue boxes), MKC sees IPE but is blind to IPO; and in terms of the direct causes of patient, healthcare provider, and system outcomes (cf. Figure  4 , green and grey boxes), MKC sees factual IPC but is blind to the complementary non-interprofessional causes because none of its levels covers them. MKC is a classification limited to detecting outcomes of IPE, and neither IPO nor non-interprofessional factors are such outcomes. When speaking about the original model (but with his statement being transferable to MKC), Bates notes that “Kirkpatrick’s model implicitly assumes that examination of (…) [contextual] factors is not essential for effective evaluation” [ 52 ]. Citing Goldstein and Ford [ 56 ], he continues, “when measurement is restricted to (…) the four (…) levels no formative data about why training was or was not effective is generated” [ 52 ]. Specifically targeting the MKC version, Thistlethwaite and colleagues imply that MKC lacks IPO: “When thinking of applying of Kirkpatrick’s framework to IPE, we must remember the importance of the clinical environment (…) and consider how conducive it is to, and facilitative of, any potential change in behaviour arising from interprofessional learning activities” [ 42 ].

Bordage calls conceptual frameworks “lenses” through which scientists see the subjects of their studies [ 57 ]. Following this metaphor, we conclude that the resolution of the “conceptual lens” of MKC, if misused as a causal model, is too low for process evaluations. In our perspective, this, in turn, is the most likely reason why outcome evaluations of the past have failed to reliably demonstrate terminal benefits of IPE and IPC.

It is important to note that MKC by design solely, agnostically and successfully measures outcomes of interprofessional education in different dimensions. Therefore, its failure to detect bottlenecks in IPE and IPC is not its own fault, but the fault of evaluators who continue to use it as a causal model while failing to acknowledge the multi-staged multi-causality of patient, healthcare provider, and system outcomes.

We next take a closer look at how exactly MKC fails. In the mono-linear, low-resolution view of MKC, if a study that evaluates the effects of an intervention fails to detect final outcomes, the only logical possible conclusion is to question the effectiveness of previous levels. If there are changes in interprofessional behavior (level 3) but there is no benefit to patients (level 4b), the conclusion is that changes in interprofessional behavior are not beneficial to patients; if there are interprofessional competencies acquired by learners (level 2) but no subsequent change in interprofessional behavior (level 3), then interprofessional competencies do not translate into behavior. Using MKC as the conceptual lens, the logical answer to “why” is that “the training program was not designed in ways that fostered effective transfer or (…) other input factors blocked skill application” [ 52 ], and a straightforward overall conclusion with regards to the knowledge gap about the benefits of IPE and IPC would be that IPE is not very effective in terms of patient, healthcare provider, and system outcomes. While this disappointing result has actually been considered as a possibility [ 6 , 9 , 10 ], more often alternative explanations are sought in an attempt to rescue IPE efforts and to avoid the conclusion that IPE is ineffective while sticking with MKC as the causal model.

One of these “escape routes” is to claim that it is methodologically too difficult to measure outcomes on MKC levels 3, 4a and 4b by using different variants of a temporal argument. Paraphrasing Belfield et al. [ 58 ], Roland [ 15 ] states that “patient outcomes may only become apparent over a protracted period of time due to the time needed for the learner to acquire and implement new skills [emphasis added by us, also in the following quotations]” whereas Hammick and colleagues state, “It is unsurprising that all but one of the studies (…) evaluated IPE for undergraduate students. The time gap between their interprofessional learning and qualification clearly presents a challenges [sic] associated with evaluating levels 3, 4a and 4b outcomes” [ 11 ]. Yardley and Dornan add, “early workplace experience (…) might take months or even years to have any demonstrable effect on learners, let alone patients” [ 14 ]. The IOM comments that “Efforts to generate this evidence are further hindered by the relatively long lag time between education interventions and patient, population, and system outcomes” [ 8 ] while Reeves and colleagues note that “ the time gap between undergraduates receiving their IPE and them qualifying as practitioners presents challenges with reporting outcomes at Levels 3, 4a, and 4b” [ 2 ]. The core argument here is always that undergraduate IPE happens in educational institutions whereas IPC happens at the workplace at healthcare institutions much later . By this logic, the causal chain assumed by MKC might be fully intact but the time lag between an IPE intervention and effects on levels 3, 4a and 4b constitutes an insurmountable methodological difficulty and renders comprehensive evaluations of IPE outcomes impossible.

Another “escape route” is to invoke “complexity” of IPE as the reason why its final outcomes are hard to detect. Thistlethwaite and colleagues [ 42 ] agree with Yardley and Dornan [ 14 ] that the MKC is not suited to evaluate “the complexity of health profession education and practice.” The authors from the IOM state that “The lack of a well-defined relationship between IPE and patient and population health and health care delivery system outcomes is due in part to the complexity of the learning and practice environments” [ 8 ]. The term “complexity” usually refers to systems which are cognitively difficult to understand because they have many elements or because science has not figured out yet how to model their interactions [ 59 ]. In our opinion, the term “complexity” in the context of IPE is ill-defined and a placeholder for saying that the set of causes of patient, healthcare provider, and system outcomes is not being understood well and that a more precise causal model is required to figure out what is going on.

Compare and contrast: “multi-stage multi-causality” as causal model

If we use “multi-stage multi-causality” as the conceptual lens instead of MKC we increase the available resolution and can see more elements of the system. If evaluations fail to show beneficial outcomes of IPE or IPC, we now can do much better asking the right sub-questions to find an answer to “why”. Viewed through the high-resolution lens of the multi-stage multi-causality model, the list of possible failure points on this trajectory significantly expands. The resulting high-resolution picture provides an exquisite set of novel testable hypotheses (Table  1 ). Collecting data on different levels, including the level of factual IPC, should enable decisions as to which of these scenarios were attributable to an IPE intervention having no multi-level effect.

Taken together, we argue that the answers to “why” allowed by the low resolution lens of MKC when misused as a causal model might sometimes be wrong and should be replaced with more detailed explanations.

It is premature to conclude that IPE has no effects on patient, healthcare provider, and system outcomes unless the presence or absence of all co-causes has been considered.

The deeper cause of the temporal argument might be to mistakenly use MKC as a causal model because the use of MKC masks any problems with IPO or other co-causes. Given the higher resolution of the multi-stage multi-causality model, it is now possible to conceptually distinguish between the known challenge arising from the passage of time (creating various confounders) and the case in which a lack of IPO blocks the effects of IPE. It should be possible, in principle, to assess at any later point in time, for example by means of a survey, how much and which types of IPE members of an interprofessional team had experienced earlier in their career and how much they remember; or even to assess their current competencies for IPC in a practical exam. Such measurements might reveal that individual competencies for IPC are present, no matter how much time has passed since their acquisition, and that IPO is the actual bottleneck.

Likewise, alleged methodological perplexity due to IPE “complexity” is de-emphasized if we swap the low-resolution lens of MKC for the high-resolution lens of the multi-staged multi-causality model. The high-resolution picture (Fig.  4 ; Table  1 ) replaces the fuzzy placeholder of “complexity” by adding missing elements of the system to the model.

In sum we have demonstrated that when MKC is misused as a causal model it neglects co-causing factors with essential influence on IPE outcomes, is therefore an insufficient tool to detect bottlenecks, and edges out any better-suited, viable causal model. This miscast hampers meaningful process evaluations, the subsequent improvement of indicators and interventions, and thereby ultimately the progress in proving beneficial patient, healthcare provider, and system outcomes of IPE and IPC.

Limitations

One limitation of our theoretical critical discussion is that we did not illuminate how hard it is to quantify patient, healthcare provider, and system outcomes from a methodological point of view (e.g. document-based patient data analysis). Neither did we address the extent to which this limits the meaningfulness of IPE/IPC outcome evaluations. However, we claim that the conceptual weakness of missing co-causalities is the deeper root of the evaluation problem, not particular methods, and that methodological issues are solvable as soon as relevant co-causalities are appropriately considered.

Another limitation is that a model is always a simplification. For example, the multi-stage multi-causality model does not include personality traits of team members, intra-personal abilities like self-regulation, or the harmony of personality types within a team, which also play a role in factual IPC. These traits would be difficult to incorporate into the model and gathering such information for evaluations might even be unethical. Similarly, the model does not reflect the influence which the behavior and health literacy of patients (and their families, caregivers, and communities) might have on factual IPC.

A third limitation is that we did not discuss a particular setting in which the use of MKC as mono-linear causal model could work, namely, if IPE champions themselves become IPO managers and subsequently establish factual IPC in their institutions through an appropriate combination of IPE and IPO. In this scenario, the roles of health professionals (as carriers of IPE-induced competence for factual IPC) and managers (as IPO decision makers) overlap – obviously potentially optimal to foster factual IPC. In a certain sense, in this particular case, IPE would lead to IPO and to factual IPC with the potential of “transitively” improving patient, healthcare provider, and system outcomes. However, as we believe that there is no fixed relationship between undergoing IPE and becoming a healthcare manager, we did not pursue this line of argument further, regarding it as an exception.

Conclusions

In our critical discussion we have analyzed previous models of causes and effects in IPC based on the existing literature, proposed a novel “multi-stage multi-causality” model, and demonstrated its explanatory power by establishing that MKC is not suited to foster progress in proving or disproving beneficial final outcomes of IPE and IPC. We conclude with 6 practical, applicable recommendations for future IPE, IPO, and IPC outcome evaluations.

Recommendation 1: stop (mis-)using MKC as a causal model

We have pointed out that the continued use of MKC as causal model seems to severely inhibit the scientific exploration of the co-necessity of IPO and non-interprofessional factors and therefore delays answering the important question whether IPE and IPC actually improve patient, healthcare provider, and system outcomes. As early as 1989, the use of the original Kirkpatrick model as a causal model was questioned [ 55 ]. In 2004, Bates took the position that the continued use of this model is unethical if beneficial results are missed by evaluations due to the narrow focus on outcomes [ 52 ]. Today, we conclude that using MKC as a causal model in IPE, IPO or IPC outcome evaluations should be discontinued.

Recommendation 2: state the causal model under which evaluations of IPE/IPO/IPC operate

Evaluators should make an explicit statement about the causal model under which they design interventions and interpret results, including their additional assumptions about the chain of causes and effects. Knowledge of these assumptions allows the reader to detect inconsistencies – an important element for causal clarification – and should prevent the field of IPE, IPO, and IPC outcome evaluations from getting mired down for even more decades.

Recommendation 3: always include some process evaluation

Even if the primary goal of a study is summative outcome evaluation, evaluators should always include some process evaluation to test the causal model they assume and under which they designed their evaluation, and do so at least until the topic of causality in IPE, IPO, and IPC is fully settled. For example, if an IPE intervention aims at improving factual IPC, evaluators who assume multi-causality would co-evaluate IPO to make sure that IPO is no bottleneck in the evaluated setting.

Recommendation 4: strive for specificity in IPE, IPO, or IPC interventions

If the only goal of an intervention is to improve a certain outcome metric like patient safety, one might initiate a broad, non-specific intervention using best-practice guidelines and all available resources. However, if a goal of the intervention is also to show the existence of specific benefits of IPE, IPO, or IPC in a scientific way, then the multi-causality of outcomes must be taken into account. Intervention designs that change both, interprofessional and non-interprofessional causes of outcomes, must be avoided. For example, if uniprofessional training (a cause outside the domain of IPE/IPO/IPC) is also part of an intervention (e.g. the re-design of the entire workflow in an emergency department in order to enhance patient safety), then this mix of causes obscures the contribution of IPE, IPO, or IPC to the desired effect. Reeves and colleagues euphemistically and aptly call measuring the particular influence of IPE on patient outcomes in such multifaceted interventions a “challenge” [ 10 ]. This example shows why theoretical clarity about the causal model is required to effectively evaluate beneficial outcomes of IPE, IPO, or IPC. Respecting the multi-stage multi-causality of patient, healthcare provider, and system outcomes means designing interventions that improve interprofessional elements only, or, if other components inevitably change as well, to control for those components through comprehensive measurements and/or by adding qualitative methods that allow final outcomes to be causally attributed to IPE, IPO, or IPC.

Recommendation 5: always quantify factual IPC

Recommendations 5 and 6 are our most important recommendations. It is self-explanatory that without the emergence of factual IPC there cannot be any final, globally desirable outcomes of upstream IPE or IPO activities; not until IPE or IPO activities improve factual IPC, does the attempt to evaluate their effects on patient, healthcare provider, and system outcomes start to make any sense. Further, if a positive correlation exists between the quality of factual IPC and patient, healthcare provider, and system outcomes, then correlating factual IPC with final outcomes is the most conclusive way to show it. While the notion that factual IPC is the minimum necessary condition for final outcomes of interprofessional efforts is not new [ 8 , 19 ], the realization that the attached transitivity assumption (that IPE automatically creates the necessary IPC) is wrong, certainly is. As shown above, dismissing transitivity is a cogent consequence of embracing the multi-stage multi-causality of final outcomes. In future evaluations, the quantification of IPE therefore should no longer serve as a surrogate for the quantification of factual IPC. Rather, factual IPC, as an intermediate necessary step towards final outcomes and their most direct cause within the realm of IPE/IPO/IPC, always needs to be evaluated on its own. The same holds true for future evaluations of IPO. IPO interventions do not automatically lead to factual IPC, but first must be shown to improve factual IPC before they can be expected to cause any changes in patient, healthcare provider, and system outcomes. Taken together, a comprehensive measurement of the quality of factual IPC needs to be the centerpiece of any meaningful evaluation of final outcomes achieved by IPE interventions, IPO interventions, combined IPE + IPO interventions, or of factual IPC itself.

From the large number of dimensions of factual IPC (see “Methods”) arises the necessity to evaluate it in detail. Such completeness in the evaluation of factual IPC is important for several reasons:

Obtaining a meaningful sum score: Evaluating factual IPC in a given setting against a hypothetical optimum requires integration of all of its subdimensions into one sum score.

Not missing correlations: If an IPC score does not cover all dimensions of factual IPC, correlations between factual IPC and its effects (or causes) might be missed, even if these relationships truly exist. Example: An evaluation which only includes the dimensions of “mutual respect” and “conflict management” might miss an actually existing correlation between factual IPC and cost effectiveness, mainly driven, say by the dimension of “shared creation of the treatment plan and coordination of its execution”. The result of this evaluation could cast substantial doubt on the existence of positive effects of factual IPC despite them actually being there. Similarly, only a complete set of IPC indicators is suited to reveal potentially diverging effects of different subdimensions of factual IPC on different final outcomes. For example, optimal interprofessional team behavior that maximizes patient safety, might, at the same time, turn out to be less cost effective than multiprofessional team behavior that compromises on patient safety.

Optimizing process evaluation: A complete IPC coverage further provides valuable information for process evaluations aimed at identifying weaknesses in factual IPC. Significant correlations between outcomes and specific subdimensions of IPC can suggest causal relationships and uncover crucial components for successful IPC in a given setting. Focusing on strengthening these subdimensions could help optimize patient, healthcare provider, and system outcomes.

Enabling setting independence and comparisons: Factual IPC is setting-specific [ 11 , 19 , 35 , 60 , 61 ], i.e. the needs of patients for specific medical services differ across different contexts of patient care (e.g. emergency care; acute care; rehabilitation; chronic care; multimorbid patients; palliative care). As a consequence, different subdimensions of factual IPC contribute to the outcomes of interest to a variable degree depending on the specific healthcare setting. Even within a specific setting, requirements and behaviors necessary for effective IPC can vary due to the specifics of the case, e.g. the particular rareness or severity of the patient’s condition. Assumptions made prior to an evaluation about which subdimensions of factual IPC are most important in a specific setting therefore should not preclude the exploratory evaluation of the other subdimensions. If an evaluation grid misses IPC subdimensions, it may work well in one setting but fail in others. Hence, the completeness of indicators for factual IPC in an evaluation instrument creates setting independence, eliminates the burden of adjusting the included IPC subdimensions every time a new healthcare setting is evaluated, and allows unchanged evaluation instruments to be re-used in subsequent studies (called for by e.g. [ 16 ]) as well as multi-center studies (called for by e.g. [ 2 ]). A starting point for the operationalization of factual IPC including all of its subdimensions is provided in our definition of factual IPC (see “Methods”; a validated evaluation toolbox based on this operationalization will be published elsewhere; a published tool which also covers all subdomains of factual IPC, with a focus on adaptive leadership, is the AITCS [ 39 ]).

Recommendation 6: use a step-by-step approach for proving benefits of IPE and IPO

The multi-stage multi-causality of patient, healthcare provider, and system outcomes naturally implies that the process of proving that IPE or IPO benefits final outcomes could be broken down into discrete steps. The key idea is to evaluate the impact of interprofessional activities on each of the subsequent levels in the causal chain while controlling for non-interprofessional factors. Showing the effects of IPE on IPC competencies, the effects of IPC competencies on factual IPC, and the effects of factual IPC on patient, healthcare provider, and system outcomes then becomes three different research agendas that can be processed independently. If it can be shown in the first of these research agendas that IPE leads to learning (by controlling for non-interprofessional learning-related factors), and in the second, independent research agenda, that learning leads to improved factual IPC (controlling for IPO), and in the third research agenda that factual IPC leads to desired final outcomes (controlling for co-conditions for final outcomes like uniprofessional competencies), then the benefit of IPE on patient, healthcare provider, and system outcomes is ultimately proved. If this approach fails, then at least it will be exactly revealed where the chain of effects breaks down. The same holds true for IPO: Show that IPO interventions lead to work processes and/or favorable institutional conditions which support factual IPC, separately show that these work processes and conditions lead to improved factual IPC (if co-conditions for factual IPC like IPE are present), and show that better factual IPC leads to an improvement of final outcomes; then the positive impact of IPO is verified.

By covering the entire process, this “step-by-step” approach could build a compelling case for how interprofessional interventions lead to desired final outcomes. It further could markedly simplify the agenda of interprofessional research because it takes the burden of showing the effect of one particular IPE or IPO intervention on one particular final outcome off the shoulders of evaluators. After breaking down the evaluation task into separate steps that prove the impact from link to link, researchers are free to work on one step at a time only.

The presented critical discussion advances the theoretical foundations of evaluations in the field of IPE, IPO and IPC. To improve patient-centered care by means of IPC, one needs to think bigger than just training of healthcare professionals in the competencies and mindsets required for effective IPC; work processes also have to be established and optimized in a setting-dependent manner to allow for factual IPC to happen. Besides IPC, factors like discipline-specific knowledge of health professionals or administrative aspects of patient management have to be optimized, too, to achieve optimal patient, healthcare provider, and system outcomes.

By sharing the multi-stage multi-causality model and its pertinent theoretical clarification we hope to contribute to a deeper understanding of causes and effects in interprofessional collaboration, to answer the repeated call in the research community for improved theory in this field, to explain difficulties faced by past evaluations, and to provide helpful guidance for future research studies. Our key recommendations for future evaluations of interprofessional outcomes are to focus on a comprehensive evaluation of factual IPC as the most fundamental metric and to deploy a step-by-step research agenda with the overarching goal of proving beneficial patient, healthcare provider, and system outcomes related to IPE, IPO, and IPC. With these contributions, we hope to help healthcare institutions improve their evaluations of IPE, IPO, and IPC, ultimately benefiting health, healthcare provider, and system outcomes.

Availability of data and materials

No datasets were generated or analysed during the current study.

Abbreviations

  • Interprofessional collaboration
  • Interprofessional education
  • Interprofessional organization
  • Modified Kirkpatrick classification

World Health Organization

Körner M, Bütof S, Müller C, Zimmermann L, Becker S, Bengel J. Interprofessional teamwork and team interventions in chronic care: A systematic review. J Interprof Care. 2016;30(1):15–28.

Article   Google Scholar  

Reeves S, Fletcher S, Barr H, Birch I, Boet S, Davies N, et al. A BEME systematic review of the effects of interprofessional education: BEME Guide No. 39. Med Teach. 2016;38(7):656–68.

Reeves S, Pelone F, Harrison R, Goldman J, Zwarenstein M. Interprofessional collaboration to improve professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2017. https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD000072.pub3/full . Accessed 23 Feb 2024.

Sibbald B, Bojke C, Gravelle H. National survey of job satisfaction and retirement intentions among general practitioners in England. BMJ. 2003;326(7379):22.

Cowin LS, Johnson M, Craven RG, Marsh HW. Causal modeling of self-concept, job satisfaction, and retention of nurses. Int J Nurs Stud. 2008;45(10):1449–59.

Paradis E, Whitehead CR. Beyond the lamppost: a proposal for a fourth wave of education for collaboration. Acad Med. 2018;93(10):1457.

Holly C, Salmond S, Saimbert M. Comprehensive Systematic Review for Advanced Practice Nursing. 2nd ed. New York: Springer Publishing Company; 2016.

Google Scholar  

IOM (Institute of Medicine). Measuring the impact of interprofessional education on collaborative practice and patient outcomes. Washington, DC: The National Academies Press; 2015.

Brandt B, Lutfiyya MN, King JA, Chioreso C. A scoping review of interprofessional collaborative practice and education using the lens of the Triple Aim. J Interprof Care. 2014;28(5):393–9.

Reeves S, Perrier L, Goldman J, Freeth D, Zwarenstein M. Interprofessional education: effects on professional practice and healthcare outcomes (Review) (Update). Cochrane Database Syst Rev. 2013. https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD002213.pub3/full . Accessed 23 Feb 2024.

Hammick M, Freeth D, Koppel I, Reeves S, Barr H. A best evidence systematic review of interprofessional education: BEME Guide no. 9. Med teach. 2007;29(8):735–51.

Zwarenstein M, Reeves S, Perrier L. Effectiveness of pre-licensure interprofessional education and post-licensure collaborative interventions. J Interprof Care. 2005;19(Suppl 1):148–65.

Barr H, Hammick M, Koppel I, Reeves S. Evaluating interprofessional education: two systematic reviews for health and social care. Br Edu Res J. 1999;25(4):533–44.

Yardley S, Dornan T. Kirkpatrick’s levels and education ‘evidence’. Med Educ. 2012;46(1):97–106.

Roland D. Proposal of a linear rather than hierarchical evaluation of educational initiatives: the 7Is framework. J Educ Eval Health Profess. 2015;12:35.

Thannhauser J, Russell-Mayhew S, Scott C. Measures of interprofessional education and collaboration. J Interprof Care. 2010;24(4):336–49.

Reeves S, Goldman J, Gilbert J, Tepper J, Silver I, Suter E, et al. A scoping review to improve conceptual clarity of interprofessional interventions. J Interprof Care. 2011;25(3):167–74.

Suter E, Goldman J, Martimianakis T, Chatalalsingh C, DeMatteo DJ, Reeves S. The use of systems and organizational theories in the interprofessional field: Findings from a scoping review. J Interprof Care. 2013;27(1):57–64.

Havyer RD, Wingo MT, Comfere NI, Nelson DR, Halvorsen AJ, McDonald FS, et al. Teamwork assessment in internal medicine: a systematic review of validity evidence and outcomes. J Gen Intern Med. 2014;29(6):894–910.

Reeves S, Boet S, Zierler B, Kitto S. Interprofessional education and practice guide No. 3: Evaluating interprofessional education. J Interprofess Care. 2015;29(4):305–12.

McNaughton SM, Flood B, Morgan CJ, Saravanakumar P. Existing models of interprofessional collaborative practice in primary healthcare: a scoping review. J Interprof Care. 2021;35(6):940–52.

Barr H, Freeth D, Hammick M, Koppel I, Reeves S. Evaluations of Interprofessional Education: A United Kingdom Review for Health and Social Care. Centre for the Advancement of Interprofessional Education and The British Educational Research Association. 2000. https://www.caipe.org/resources/publications/barr-h-freethd-hammick-m-koppel-i-reeves-s-2000-evaluations-of-interprofessional-education . Accessed 23 Feb 2024.

Barr H, Koppel I, Reeves S, Hammick M, Freeth D. Effective Interprofessional Education: Argument, assumption, and evidence. Wiley-Blackwell; 2005.

Baethge C, Goldbeck-Wood S, Mertens S. SANRA—a scale for the quality assessment of narrative review articles. Research integrity and peer review. 2019;4(1):1–7.

CAIPE (Centre for the Advancement of Interprofessional Education). About CAIPE, s. v. “Defining Interprofessional Education”. https://www.caipe.org/about . Accessed 23 Feb 2024.

D’amour D, Oandasan I. Interprofessionality as the field of interprofessional practice and interprofessional education: An emerging concept. J Interprof Care. 2005;19(Suppl 1):8–20.

Körner M, Wirtz MA. Development and psychometric properties of a scale for measuring internal participation from a patient and health care professional perspective. BMC Health Serv Res. 2013;13(1):374.

Körner M, Wirtz MA, Bengel J, Göritz AS. Relationship of organizational culture, teamwork and job satisfaction in interprofessional teams. BMC Health Serv Res. 2015;15(1):243.

World Health Organization. Framework for Action on Interprofessional Education & Collaborative Practice. 2010. https://www.who.int/publications/i/item/framework-for-action-on-interprofessional-education-collaborative-practice . Accessed 23 Feb 2024.

Wei H, Horns P, Sears SF, Huang K, Smith CM, Wei TL. A systematic meta-review of systematic reviews about interprofessional collaboration: facilitators, barriers, and outcomes. J Interprof Care. 2022;36(5):735–49.

Morey JC, Simon R, Jay GD, Wears RL, Salisbury M, Dukes KA, et al. Error reduction and performance improvement in the emergency department through formal teamwork training: evaluation results of the MedTeams project. Health Serv Res. 2002;37(6):1553–81.

D’Amour D, Ferrada-Videla M, San Martin Rodriguez L, Beaulieu M-D. The conceptual basis for interprofessional collaboration: core concepts and theoretical frameworks. J interprofess care. 2005;19(1):116–31.

San Martín-Rodríguez L, Beaulieu M-D, D’Amour D, Ferrada-Videla M. The determinants of successful collaboration: a review of theoretical and empirical studies. J Interprof Care. 2005;19(Suppl 1):132–47.

Guise J-M, Deering SH, Kanki BG, Osterweil P, Li H, Mori M, et al. Validation of a tool to measure and promote clinical teamwork. Simulation in Healthcare. 2008;3(4):217–23.

Orchard C, Bainbridge L, Bassendowski S.. A national interprofessional competency framework. Vancouver: Canadian Interprofessional Health Collaborative; University of British Columbia; 2010; Available online: http://ipcontherun.ca/wp-content/uploads/2014/06/National-Framework.pdf . Accessed 29 May 2024.

Oishi A, Murtagh FE. The challenges of uncertainty and interprofessional collaboration in palliative care for non-cancer patients in the community: a systematic review of views from patients, carers and health-care professionals. Palliat Med. 2014;28(9):1081–98.

Valentine MA, Nembhard IM, Edmondson AC. Measuring teamwork in health care settings: a review of survey instruments. Med Care. 2015;53(4):e16–30.

Lie DA, Richter-Lagha R, Forest CP, Walsh A, Lohenry K. When less is more: validating a brief scale to rate interprofessional team competencies. Med Educ Online. 2017;22(1):1314751.

Orchard C, Pederson LL, Read E, Mahler C, Laschinger H. Assessment of interprofessional team collaboration scale (AITCS): further testing and instrument revision. J Contin Educ Heal Prof. 2018;38(1):11–8.

Begun JW, White KR, Mosser G. Interprofessional care teams: the role of the healthcare administrator. J Interprof Care. 2011;25(2):119–23.

Weaver SJ, Salas E, King HB. Twelve best practices for team training evaluation in health care. The Joint Commission Journal on Quality and Patient Safety. 2011;37(8):341–9.

Thistlethwaite J, Kumar K, Moran M, Saunders R, Carr S. An exploratory review of pre-qualification interprofessional education evaluations. J Interprof Care. 2015;29(4):292–7.

Gilbert JH, Yan J, Hoffman SJ. A WHO report: framework for action on interprofessional education and collaborative practice. J Allied Health. 2010;39(3):196–7.

Taylor & Francis Online. http://www.tandfonline.com (2024). Accessed 23 Feb 2024.

Sockalingam S, Tan A, Hawa R, Pollex H, Abbey S, Hodges BD. Interprofessional education for delirium care: a systematic review. J Interprof Care. 2014;28(4):345–51.

Folland S, Goodman AC, Stano M. The Economics of Health and Health Care. 7th ed. New York, NY: Routledge; 2016.

Book   Google Scholar  

Hayes B, Bonner A, Pryor J. Factors contributing to nurse job satisfaction in the acute hospital setting: a review of recent literature. J Nurs Manag. 2010;18(7):804–14.

Curran V, Reid A, Reis P, Doucet S, Price S, Alcock L, et al. The use of information and communications technologies in the delivery of interprofessional education: A review of evaluation outcome levels. J Interprof Care. 2015;29(6):541–50.

Danielson J, Willgerodt M. Building a theoretically grounded curricular framework for successful interprofessional education. A J Pharmaceut Educ. 2018;82(10):1133–9.

Kirkpatrick D, Kirkpatrick J. Evaluating Training Programs: The Four Levels. 3rd ed. San Francisco, CA: Berrett-Koehler Publishers; 2006.

Kirkpatrick J, Kirkpatrick W. An Introduction to The New World Kirkpatrick Model. Kirkpatrick Partners. 2021. http://www.kirkpatrickpartners.com/wp-content/uploads/2021/11/Introduction-to-the-Kirkpatrick-New-World-Model.pdf . Accessed 30 Nov 2023.

Bates R. A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence. Eval Program Plann. 2004;27(3):341–7.

Pawson R, Tilley N. Realistic evaluation. 1st ed. London: Sage Publications Ltd; 1997.

Holton EF III. The flawed four-level evaluation model. Hum Resour Dev Q. 1996;7(1):5–21.

Alliger GM, Janak EA. Kirkpatrick’s levels of training criteria: Thirty years later. Pers Psychol. 1989;42(2):331–42.

Goldstein IL, Ford JK. Training in organisations: Needs assessment, development, and evaluation. 4th ed. Belmont, CA: Wadsworth; 2002.

Bordage G. Conceptual frameworks...: What lenses can they provide to medical education? Investigación en educación médica. 2012;1(4):167–9.

Belfield C, Thomas H, Bullock A, Eynon R, Wall D. Measuring effectiveness for best evidence medical education: a discussion. Med Teach. 2001;23(2):164–70.

Johnson N. Simply complexity: A clear guide to complexity theory. London: Oneworld Publications; 2009.

Retchin SM. A conceptual framework for interprofessional and co-managed care. Acad Med. 2008;83(10):929–33.

Schmitz C, Atzeni G, Berchtold P. Challenges in interprofessionalism in Swiss health care: the practice of successful interprofessional collaboration as experienced by professionals. Swiss Med Wkly. 2017;147: w14525.

Download references

Acknowledgements

Not applicable.

The Swiss Federal Office of Public Health partly funded this study with the contractual mandate “Bildung und Berufsausübung: Evaluationsinstrumente”.

Author information

Authors and affiliations.

Institute for Medical Education, Department for Assessment and Evaluation, University of Bern, Bern, Switzerland

Florian B. Neubauer, Felicitas L. Wagner, Andrea Lörwald & Sören Huwendiek

You can also search for this author in PubMed   Google Scholar

Contributions

FBN, FLW, AL and SH made substantial contributions to the conception of the work and to the literature searches. FBN wrote the first full draft of the manuscript. All authors improved the draft. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Florian B. Neubauer .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Neubauer, F.B., Wagner, F.L., Lörwald, A. et al. Sharpening the lens to evaluate interprofessional education and interprofessional collaboration by improving the conceptual framework: a critical discussion. BMC Med Educ 24 , 615 (2024). https://doi.org/10.1186/s12909-024-05590-0

Download citation

Received : 29 November 2023

Accepted : 22 May 2024

Published : 04 June 2024

DOI : https://doi.org/10.1186/s12909-024-05590-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Outcome evaluation
  • Process evaluation
  • Causal model
  • Conceptual framework
  • Terminology

BMC Medical Education

ISSN: 1472-6920

advantages of using literature review

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 03 June 2024

The effectiveness of digital twins in promoting precision health across the entire population: a systematic review

  • Mei-di Shen 1 ,
  • Si-bing Chen 2 &
  • Xiang-dong Ding   ORCID: orcid.org/0009-0001-1925-0654 2  

npj Digital Medicine volume  7 , Article number:  145 ( 2024 ) Cite this article

2 Altmetric

Metrics details

  • Public health
  • Risk factors
  • Signs and symptoms

Digital twins represent a promising technology within the domain of precision healthcare, offering significant prospects for individualized medical interventions. Existing systematic reviews, however, mainly focus on the technological dimensions of digital twins, with a limited exploration of their impact on health-related outcomes. Therefore, this systematic review aims to explore the efficacy of digital twins in improving precision healthcare at the population level. The literature search for this study encompassed PubMed, Embase, Web of Science, Cochrane Library, CINAHL, SinoMed, CNKI, and Wanfang Database to retrieve potentially relevant records. Patient health-related outcomes were synthesized employing quantitative content analysis, whereas the Joanna Briggs Institute (JBI) scales were used to evaluate the quality and potential bias inherent in each selected study. Following established inclusion and exclusion criteria, 12 studies were screened from an initial 1321 records for further analysis. These studies included patients with various conditions, including cancers, type 2 diabetes, multiple sclerosis, heart failure, qi deficiency, post-hepatectomy liver failure, and dental issues. The review coded three types of interventions: personalized health management, precision individual therapy effects, and predicting individual risk, leading to a total of 45 outcomes being measured. The collective effectiveness of these outcomes at the population level was calculated at 80% (36 out of 45). No studies exhibited unacceptable differences in quality. Overall, employing digital twins in precision health demonstrates practical advantages, warranting its expanded use to facilitate the transition from the development phase to broad application.

PROSPERO registry: CRD42024507256.

Similar content being viewed by others

advantages of using literature review

Digital twins for health: a scoping review

advantages of using literature review

Digital twins in medicine

advantages of using literature review

The health digital twin to tackle cardiovascular disease—a review of an emerging interdisciplinary field

Introduction.

Precision health represents a paradigm shift from the conventional “one size fits all” medical approach, focusing on specific diagnosis, treatment, and health management by incorporating individualized factors such as omics data, clinical information, and health outcomes 1 , 2 . This approach significantly impacts various diseases, potentially improving overall health while reducing healthcare costs 3 , 4 . Within this context, digital twins emerged as a promising technology 5 , creating digital replicas of the human body through two key steps: building mappings and enabling dynamic evolution 6 . Unlike traditional data mining methods, digital twins consider individual variability, providing continuous, dynamic recommendations for clinical practice 7 . This approach has gained significant attention among researchers, highlighting its potential applications in advancing precision health.

Several systematic reviews have explored the advancement of digital twins within the healthcare sector. One rapid review 8 identified four core functionalities of digital twins in healthcare management: safety management, information management, health management/well-being promotion, and operational control. Another systematic review 9 , through an analysis of 22 selected publications, summarized the diverse application scenarios of digital twins in healthcare, confirming their potential in continuous monitoring, personalized therapy, and hospital management. Furthermore, a quantitative review 10 assessed 94 high-quality articles published from 2018 to 2022, revealing a primary focus on technological advancements (such as artificial intelligence and the Internet of Things) and application scenarios (including personalized, precise, and real-time healthcare solutions), thus highlighting the pivotal role of digital twins technology in the field of precision health. Another systematic review 11 , incorporating 18 framework papers or reviews, underscored the need for ongoing research into digital twins’ healthcare applications, especially during the COVID-19 pandemic. Moreover, a systematic review 12 on the application of digital twins in cardiovascular diseases presented proof-of-concept and data-driven approaches, offering valuable insights for implementing digital twins in this specific medical area.

While the existing literature offers valuable insights into the technological aspects of digital twins in healthcare, these systematic reviews failed to thoroughly examine the actual impacts on population health. Despite the increasing interest and expanding body of research on digital twins in healthcare, the direct effects on patient health-related outcomes remain unclear. This knowledge gap highlights the need to investigate how digital twins promote and restore patient health, which is vital for advancing precision health technologies. Therefore, the objective of our systematic review is to assess the effectiveness of digital twins in improving health-related outcomes at the population level, providing a clearer understanding of their practical benefits in the context of precision health.

Search results

The selection process for the systematic review is outlined in the PRISMA flow chart (Fig. 1 ). Initially, 1321 records were identified. Of these, 446 duplicates (446/1321, 33.76%) were removed, leaving 875 records (875/1321, 66.24%) for title and abstract screening. Applying the pre-defined inclusion and exclusion criteria led to the exclusion of 858 records (858/875, 98.06%), leaving 17 records (17/875, 1.94%) for full-text review. Further scrutiny resulted in the exclusion of one study (1/17, 5.88%) lacking health-related outcomes and four studies (4/17, 23.53%) with overlapping data. Ultimately, 12 (12/17, 70.59%) original studies 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 were included in the systematic review. Supplementary Table 1 provides a summary of the reasons for exclusion at the full-text reading phase.

figure 1

Flow chart of included studies in the systematic review.

Study characteristics

The studies included in this systematic review were published between 2021 (2/12, 16.67%) 23 , 24 and 2023 (8/12, 66.67%) 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 . Originating from diverse regions, 4/12 studies (33.33%) were from Asia 13 , 14 , 21 , 24 , 5/12 (41.67%) from America 15 , 17 , 19 , 20 , 22 , and 3/12 (25.00%) from Europe 16 , 18 , 23 . The review encompassed various study designs, including randomized controlled trials (1/12, 8.33%) 14 , quasi-experiments (6/12, 50.00%) 13 , 15 , 16 , 18 , 19 , 21 , and cohort studies (5/12, 41.67%) 17 , 20 , 22 , 23 , 24 . The sample sizes ranged from 15 13 to 3500 patients 19 . Five studies assessed the impact of digital twins on virtual patients 15 , 16 , 18 , 19 , 20 , while seven examined their effect on real-world patients 13 , 14 , 17 , 21 , 22 , 23 , 24 . These patients included had various diseases, including cancer (4/12, 33.33%) 15 , 16 , 19 , 22 , type 2 diabetes (2/12, 16.66%) 13 , 14 , multiple sclerosis (2/12, 16.66%) 17 , 18 , qi deficiency (1/12, 8.33%) 21 , heart failure (1/12, 8.33%) 20 , post-hepatectomy liver failure (1/12, 8.33%) 23 , and dental issues (1/12, 8.33%) 24 . This review coded interventions into three types: personalized health management (3/12, 25.00%) 13 , 14 , 21 , precision individual therapy effects (3/12, 25.00%) 15 , 16 , 18 , 19 , 20 , 22 , and predicting individual risk (3/12, 25.00%) 17 , 23 , 24 , with a total of 45 measured outcomes. Characteristics of the included studies are detailed in Table 1 .

Risk of bias assessment

The risk of bias for the studies included in this review is summarized in Fig. 2 . In the single RCT 14 assessed, 10 out of 13 items received positive responses. Limitations were observed due to incomplete reporting of baseline characteristics and issues with blinding. Among the six quasi-experimental studies evaluated, five (83.33%) 13 , 15 , 16 , 18 , 21 achieved at least six positive responses, indicating an acceptable quality, while one study (16.67%) 19 fell slightly below this threshold with five positive responses. The primary challenges in these quasi-experimental studies were due to the lack of control groups, inadequate baseline comparisons, and limited follow-up reporting. Four out of five (80.00%) 17 , 20 , 22 , 23 of the cohort studies met or exceeded the criterion with at least eight positive responses, demonstrating their acceptable quality. However, one study (20.00%) 24 had a lower score due to incomplete data regarding loss to follow-up and the specifics of the interventions applied. Table 1 elaborates on the specific reasons for these assessments. Despite these concerns, the overall quality of the included studies is considered a generally acceptable risk of bias.

figure 2

The summary of bias risk via the Joanna Briggs Institute assessment tools.

The impact of digital twins on health-related outcomes among patients

This review includes 12 studies that collectively assessed 45 outcomes, achieving an overall effectiveness rate of 80% (36 out of 45 outcomes), as depicted in Fig. 3a . The digital twins analyzed were coded into three functional categories: personalized health management, precision individual therapy effects, and predicting individual risks. A comprehensive analysis of the effectiveness of digital twins across these categories is provided, detailing the impact and outcomes associated with each function.

figure 3

a The overall effectiveness of digital twins; b The effectiveness of personalized health management driven by digital twins; c The effectiveness of precision individualized therapy effects driven by digital twins; d The effectiveness of prediction of individual risk driven by digital twins.

The effectiveness of digital twins in personalized health management

In this review, three studies 13 , 14 , 21 employing digital twins for personalized health management reported an effectiveness of 80% (24 out of 30 outcomes), as shown in Fig. 3b . A self-control study 13 involving 15 elderly patients with diabetes, used virtual patient representations based on health information to guide individualized insulin infusion. Over 14 days, this approach improved the time in range (TIR) from 3–75% to 86–97%, decreased hypoglycemia duration from 0–22% to 0–9%, and reduced hyperglycemia time from 0–98% to 0–12%. A 1-year randomized controlled trial 14 with 319 type 2 diabetes patients, implemented personalized digital twins interventions based on nutrition, activity, and sleep. This trial demonstrated significant improvements in Hemoglobin A1c (HbA1C), Homeostatic Model Assessment 2 of Insulin Resistance (HOMA2-IR), Nonalcoholic Fatty Liver Disease Liver Fat Score (NAFLD-LFS), and Nonalcoholic Fatty Liver Disease Fibrosis Score (NAFLD-NFS), and other primary outcomes (all, P  < 0.001; Table 2 ). However, no significant changes were observed in weight, Alanine Aminotransferase (ALT), Fibrosis-4 Score (FIB4), and AST to Platelet Ratio Index (APRI) (all, P  > 0.05). A non-randomized controlled trial 21 introduced a digital twin-based Traditional Chinese Medicine (TCM) health management platform for patients with qi deficiency. It was found to significantly improve blood pressure, main and secondary TCM symptoms, total TCM symptom scores, and quality of life (all, P  < 0.05). Nonetheless, no significant improvements were observed in heart rate and BMI (all, P  > 0.05; Table 2 ).

The effectiveness of digital twins in precision individual therapy effects

Six studies 15 , 16 , 18 , 19 , 20 , 22 focused on the precision of individual therapy effects using digital twins, demonstrating a 70% effectiveness rate (7 out of 10 outcomes), as detailed in Fig. 3c . In a self-control study 15 , a data-driven approach was employed to create digital twins, generating 100 virtual patients to predict the potential tumor biology outcomes of radiotherapy regimens with varying contents and doses. This study showed that personalized radiotherapy plans derived from digital twins could extend the median tumor progression time by approximately six days and reduce radiation doses by 16.7%. Bahrami et al. 16 created 3000 virtual patients experiencing cancer pain to administer precision dosing of fentanyl transdermal patch therapy. The intervention led to a 16% decrease in average pain intensity and an additional median pain-free duration of 23 hours, extending from 72 hours in cancer patients. Another quasi-experimental study 18 created 3000 virtual patients with multiple sclerosis to assess the impact of Ocrelizumab. Findings indicated Ocrelizumab can resulted in a reduction in relapses (0.191 [0.143, 0.239]) and lymphopenic adverse events (83.73% vs . 19.9%) compared to a placebo. American researchers 19 developed a quantitative systems pharmacology model using digital twins to identify the optimal dosing for aggressive non-Hodgkin lymphoma patients. This approach resulted in at least a 50% tumor size reduction by day 42 among 3500 virtual patients. A cohort study 20 assessed the 5-year composite cardiovascular outcomes in 2173 virtual patients who were treated with spironolactone or left untreated and indicated no statistically significant inter-group differences (0.85, [0.69–1.04]). Tardini et al. 22 employed digital twins to optimize multi-step treatment for oropharyngeal squamous cell carcinoma in 134 patients. The optimized treatment selection through digital twins predicted increased survival rates by 3.73 (−0.75, 8.96) and dysphagia rates by 0.75 (−4.48, 6.72) compared to clinician decisions, with no statistical significance.

The effectiveness of digital twins in predicting individual risk

Three studies 17 , 23 , 24 employing digital twins to predict individual patient risks demonstrated a 100% effectiveness rate (5 out of 5 outcomes), as shown in Fig. 3d . A cohort study 17 used digital twins to forecast the onset age for disease-specific brain atrophy in patients with multiple sclerosis. Findings indicated that the onset of progressive brain tissue loss, on average, preceded clinical symptoms by 5-6 years among the 519 patients ( P  < 0.01). Another study 23 focused on predicting postoperative liver failure in 47 patients undergoing major hepatectomy through mathematical models of blood circulation. The study highlighted that elevated Postoperative Portal Vein pressure (PPV) and Portocaval Gradient (PCG) values above 17.5 mmHg and 13.5 mmHg, respectively, correlated with the measured values (all, P  < 0.0001; Table 2 ). These indicators were effective in predicting post-hepatectomy liver failure, accurately identifying three out of four patients who experienced this complication. Cho et al. 24 created digital twins for 50 adult female patients using facial scans and cone-beam computed tomography images to evaluate the anteroposterior position of the maxillary central incisors and forehead inclination. The analysis demonstrated significant differences in the position of the maxillary central incisors ( P  = 0.04) and forehead inclination ( P  = 0.02) between the two groups.

This systematic review outlines the effectiveness of digital twins in improving health-related outcomes across various diseases, including cancers, type 2 diabetes, multiple sclerosis, qi deficiency, heart failure, post-hepatectomy liver failure, and dental issues, at the population level. Distinct from prior reviews that focused on the technological dimensions of digital twins, our analysis shows the practical applications of digital twins in healthcare. The applications have been categorized into three main areas: personalized health management, precision individual therapy effects, and predicting individual risks, encompassing a total of 45 outcomes. An overall effectiveness of 80% was observed across these outcomes. This review offers valuable insights into the application of digital twins in precision health and supports the transition of digital twins from construction to population-wide implementation.

Digital twins play a crucial role in achieving precision health 25 . They serve as virtual models of human organs, tissues, cells, or microenvironments, dynamically updating based on real-time data to offer feedback for interventions on their real counterparts 26 , 27 . Digital twins can solve complex problems in personalized health management 28 , 29 and enable comprehensive, proactive, and precise healthcare 30 . In the studies reviewed, researchers implemented digital twins by creating virtual patients based on personal health data and using simulations to generate personalized recommendations and predictions. It is worth noting that while certain indicators have not experienced significant improvement in personalized health management for patients with type 2 diabetes and Qi deficiency, it does not undermine the effectiveness of digital twins. Firstly, these studies have demonstrated significant improvements in primary outcome measures. Secondly, improving health-related outcomes in chronic diseases is an ongoing, complex process heavily influenced by changes in health behaviors 31 , 32 . While digital twins can provide personalized health guidance based on individual health data, their impact on actual behaviors warrants further investigation.

The dual nature of medications, providing benefits yet potentially leading to severe clinical outcomes like morbidity or mortality, must be carefully considered. The impact of therapy is subject to various factors, including the drug attributes and the specific disease characteristics 33 . Achieving accurate medication administration remains a significant challenge for healthcare providers 34 , underscoring the need for innovative methodologies like computational precise drug delivery 35 , 36 , a example highlighted in our review of digital twins. Regarding the prediction of individual therapy effects for conditions such as cancer, multiple sclerosis, and heart failure, six studies within this review have reported partly significant improvements in patient health-related outcomes. These advancements facilitate the tailored selection and dosing of therapy, underscoring the ability of digital twins to optimize patient-specific treatment plans effectively.

Furthermore, digital twins can enhance clinical understanding and personalize disease risk prediction 37 . It enables a quantitative understanding and prediction of individuals by continuously predicting and evaluating patient data in a virtual environment 38 . In patients with multiple sclerosis, digital twins have facilitated predictions regarding the onset of disease-specific brain atrophy, allowing for early intervention strategies. Similarly, digital twins assessed the risk of liver failure after liver resection, aiding healthcare professionals in making timely decisions. Moreover, the application of digital twins in the three-dimensional analysis of patients with dental problems has demonstrated highly effective clinical significance, underscoring its potential across various medical specialties. In summary, the adoption of digital twins has significantly contributed to advancing precision health and restoring patient well-being by creating virtual patients based on personal health data and using simulations to generate personalized recommendations and predictions.

Recent studies have introduced various digital twin systems, covering areas such as hospital management 8 , remote monitoring 9 , and diagnosing and treating various conditions 39 , 40 . Nevertheless, these systems were not included in this review due to the lack of detailed descriptions at the population health level, which constrains the broader application of this emerging technology. Our analysis underscores the reported effectiveness of digital twins, providing unique opportunities for dynamic prevention and precise intervention across different diseases. Multiple research methodologies and outcome measures poses a challenge for quantitative publication detection. This systematic review employed a comprehensive retrieval strategy across various databases for screening articles on the effectiveness of digital twins, to reduce the omission of negative results. And four repeated publications were excluded based on authors, affiliation, population, and other criteria to mitigate the bias of overestimating the digital twins effect due to repeated publication.

However, there are still limitations. Firstly, the limited published research on digital twins’ application at the population level hinders the ability to perform a quantitative meta-analysis, possibly limiting our findings’ interpretability. We encourage reporting additional high-quality randomized controlled trials on the applicability of digital twins to facilitate quantitative analysis of their effectiveness in precision health at the population level. Secondly, this review assessed the effectiveness of digital twins primarily through statistical significance ( P -value or 95% confidence interval). However, there are four quasi-experimental studies did not report statistical significance. One of the limitations of this study is the use of significant changes in author self-reports as a criterion in these four quasi-experimental studies for identifying effectiveness. In clinical practice, the author’s self-reported clinical significance can also provide the effectiveness of digital twins. Thirdly, by focusing solely on studies published in Chinese and English, this review may have omitted relevant research available in other languages, potentially limiting the scope of the analyzed literature. Lastly, our review primarily emphasized reporting statistical differences between groups. Future work should incorporate more application feedback from real patients to expose digital twins to the nuances of actual patient populations.

The application of digital twins is currently limited and primarily focused on precision health for individual patients. Expanding digital twins’ application from individual to group precision health is recommended to signify a more extensive integration in healthcare settings. This expansion involves sharing real-time data and integrating medical information across diverse medical institutions within a region, signifying the development of group precision health. Investigating both personalized medical care and collective health management has significant implications for improving medical diagnosis and treatment approaches, predicting disease risks, optimizing health management strategies, and reducing societal healthcare costs 41 .

Digital twins intervention encompasses various aspects such as health management, decision-making, and prediction, among others 9 . It represents a technological and conceptual innovation in traditional population health intervention. However, the current content design of the digital twins intervention is insufficient and suggests that it should be improved by incorporating more effective content strategies tailored to the characteristics of the target population. Findings from this study indicate that interventions did not differ significantly in our study is from digital twins driven by personalized health management, which means that compared with the other two function-driven digital twins, personalized health management needs to receive more attention to enhance its effect in population-level. For example, within the sphere of chronic disease management, integrating effective behavioral change strategies into digital twins is advisable to positively influence health-related indicators, such as weight and BMI. The effectiveness of such digital behavior change strategies has been reported in previous studies 42 , 43 . The consensus among researchers on the importance of combining effective content strategies with digital intervention technologies underscores the potential for this approach to improve patient health-related outcomes significantly.

The applications of digital twins in precision health are mainly focused on model establishment and prediction description, with limited implementation in multi-center settings. A more robust and detailed data foundation is recommended to improve clinical decision-making and reduce the likelihood of imprecise treatments. This requires continuous updating and capturing of dynamic information by digital twins in the future, as well as the improvement of the data platform that facilitates mapping, interaction, and iterative optimization. Integrating digital twins effectively into clinical workflows can support clinical interventions, assist physicians in making informed decisions, and increase the standard of patient care 6 .

The accessibility of health data is a significant challenge for the clinical implementation of digital twins. Although the internet and information technology have significantly enhanced health data availability, health data, including information systems and electronic health records, remain heterogeneous and are difficult to share 44 . Health data often contains confidential patient information, as well as unreliable information, posing challenges for implementing digital twins in healthcare settings. The primary technology utilized in digital twins, artificial intelligence algorithms, demands high-performance hardware devices and software platforms for data analysis 45 , necessitating healthcare organizations to allocate increased investment and budget for computing infrastructure supporting digital twins’ application. Therefore, future research should be focused on the technical aspects of digital twins to resolve these challenges. The automated processing of health data using a large language model and the rapid conversion of complex natural language texts into comprehensive knowledge texts are encouraged. The development of high-performance computing technology is essential for cost-effective computing requirements, which can facilitate the application of digital twins in clinical practice 46 .

Overall, this systematic review offers a comprehensive overview of digital twins in precision health, examining their impact at the population level. The findings indicate a significant overall effectiveness rate of 80% for the measured outcomes, highlighting digital twins’ pivotal role in advancing precision health. Future research should broaden the application of digital twins across various populations, integrate proven content strategies, and implement these approaches in various healthcare settings. Such efforts will maximize the benefits of digital technologies in healthcare, promoting more precise and efficacious strategies, thereby elevating patient outcomes and improving overall healthcare experiences. While digital twins offer great promise for precision health, their broad adoption and practical implementation are still in the early stages. Development, and application are essential to unlock the full potential of digital twins in revolutionizing healthcare delivery.

This systematic review was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines 47 . The protocol for this systematic review was prospectively registered on PROSPERO, which can be accessed via the following link: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42024507256 . The registered protocol underwent an update, which included polishing the title of the article, modifying the limitation of the control group and language in the inclusion/exclusion criteria, and refining the process of data synthesis and analysis to enhance that clarity and readability of this systematic review. These modifications were updated in the revision notes section of the PROSPERO.

Literature search strategy

Literature searches were conducted in PubMed, Embase, Web of Science, Cochrane Library, CINAHL, SinoMed, CNKI, and Wanfang Database, covering publications up to December 24, 2023. A comprehensive search strategy was developed using a combination of Medical Subject Headings terms and free-text terms, as detailed in Supplementary Table 2 . Furthermore, reference lists of articles and reviews meeting the inclusion criteria were reviewed for additional relevant studies.

Inclusion and exclusion criteria

The inclusion criteria for this systematic review included: 1) Population: Patients diagnosed with any diseases or symptoms; 2) Intervention: Any interventions involving digital twins; 3) Controls: Non-digital twin groups, such as standard care or conventional therapy, as well as no control group; 4) Outcomes: Health-related outcomes as the primary outcomes of interest; 5) Study design: All study designs that measured patient health-related outcomes after digital twins were included, including intervention studies and predictive cohort studies.

Initially, duplicates were removed. Exclusion criteria included: 1) Papers lacking original data, such as reviews, protocols, and conference abstracts; 2) Studies not in English or Chinese; 3) Surveys focusing on implementation and qualitative studies related to requirements. In cases of data duplication, the most comprehensive data report was included.

Study selection and Data extraction

Following the automatic removal of duplicates, two independent reviewers (MD.SHEN and SB.CHEN) conducted initial screenings of titles and abstracts against the predefined inclusion and exclusion criteria to identify potentially relevant studies. Afterward, the same reviewers examined the full texts of these shortlisted articles to confirm their suitability for inclusion. This process also involved checking the reference lists of these articles for any additional studies that might meet the criteria. Data from the included studies were systematically extracted using a pre-designed extraction form. Recorded information included the first author’s name, publication year, country of origin, type of study, sample size, study population, intervention, controls, measurements, and an appraisal of each study. Disagreements between the reviewers were resolved by consultation with a third senior reviewer (XD.DING), ensuring consensus.

Quality appraisal

The Joanna Briggs Institute (JBI) scales 48 were used to assess the quality and potential bias of each study included in the review, employing specific tools tailored to the type of study under evaluation. These tools feature response options of “yes,” “no,” “unclear,” or “not applicable” for each assessment item. For randomized controlled trials (RCTs), the JBI scale includes 13 items, with answering “yes” to at least six items indicating a high-quality study. Quasi-experimental studies were evaluated using a nine-item checklist, where five or more positive responses qualify the research as high quality. Cohort studies underwent evaluation through an 11-item checklist, with six or more affirmative responses indicating high quality. The assessment was independently carried out by two reviewers (MD.SHEN and SB.CHEN), and any disagreements were resolved through consultation with a third senior reviewer (XD.DING), ensuring the integrity and accuracy of the quality assessment.

Data synthesis and analysis

Given the heterogeneity in type of study and outcome measures, a meta-analysis was deemed unfeasible. Instead, a quantitative content analysis was employed to analyze all the selected studies 49 , 50 . Key information was extracted using a pre-designed standardized form, including the first author’s name, patient characteristics, intervention functional characteristics, measurements, results, effectiveness, and adverse events. Two reviewers (MD.SHEN and SB.CHEN) independently coded digital twin technology into three categories for descriptive analysis: personalized health management, precision individual therapy effects, and predicting individual risk, based on its functional characteristics. The Kappa statistic was applied to evaluate the inter-rater reliability during the coding process, yielding a value of 0.871, which signifies good agreement between the researchers 51 , 52 . The assessment of digital twins effectiveness was based on statistical significance ( P -value or 95% confidence interval). Outcomes with statistical significance were labeled as “resultful,” whereas those lacking statistical significance were deemed “resultless.” For quasi-experimental studies, significant changes in the authors’ self-reports were used to determine the effectiveness in the absence of reporting of statistical significance. The proportion of effectiveness was calculated as the number of “resultful” indicators divided by the total number of outcomes within each category.

Data availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Code availability

Code sharing is not applicable to this article as no codes were generated or analyzed during the current study.

Fu, M. R. et al. Precision health: A nursing perspective. Int. J. Nurs. Sci. 7 , 5–12 (2020).

PubMed   Google Scholar  

Naithani, N., Sinha, S., Misra, P., Vasudevan, B. & Sahu, R. Precision medicine: Concept and tools. Med. J., Armed Forces India 77 , 249–257 (2021).

Article   PubMed   Google Scholar  

Payne, K. & Gavan, S. P. Economics and precision medicine. Handb. Exp. Pharmacol. 280 , 263–281 (2023).

Ielapi, N. et al. Precision medicine and precision nursing: the era of biomarkers and precision health. Int. J. Gen. Med. 13 , 1705–1711 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Corral-Acero, J. et al. The ‘Digital Twin’ to enable the vision of precision cardiology. Eur. Heart J. 41 , 4556–4564 (2020).

Ferdousi, R., Laamarti, F., Hossain, M. A., Yang, C. S. & Saddik, A. E. Digital twins for well-being: an overview. Digital Twin 1 , 2022 (2022).

Article   Google Scholar  

Vallée, A. Digital twin for healthcare systems. Front. Digital health 5 , 1253050 (2023).

Elkefi, S. & Asan, O. Digital twins for managing health care systems: rapid literature review. J. Med. Internet Res. 24 , e37641 (2022).

Sun, T., He, X. & Li, Z. Digital twin in healthcare: Recent updates and challenges. Digital Health 9 , 20552076221149651 (2023).

Sheng, B. et al. Detecting latent topics and trends of digital twins in healthcare: A structural topic model-based systematic review. Digital Health 9 , 20552076231203672 (2023).

Khan, A. et al. A scoping review of digital twins in the context of the Covid-19 pandemic. Biomed. Eng. Comput. Biol. 13 , 11795972221102115 (2022).

Coorey, G. et al. The health digital twin to tackle cardiovascular disease-a review of an emerging interdisciplinary field. NPJ Digital Med. 5 , 126 (2022).

Thamotharan, P. et al. Human Digital Twin for Personalized Elderly Type 2 Diabetes Management. J. Clin. Med. 12 , https://doi.org/10.3390/jcm12062094 (2023).

Joshi, S. et al. Digital twin-enabled personalized nutrition improves metabolic dysfunction-associated fatty liver disease in type 2 diabetes: results of a 1-year randomized controlled study. Endocr. Pract. : Off. J. Am. Coll. Endocrinol. Am. Assoc. Clin. Endocrinologists 29 , 960–970 (2023).

Chaudhuri, A. et al. Predictive digital twin for optimizing patient-specific radiotherapy regimens under uncertainty in high-grade gliomas. Front. Artif. Intell. 6 , 1222612–1222612 (2023).

Bahrami, F., Rossi, R. M., De Nys, K. & Defraeye, T. An individualized digital twin of a patient for transdermal fentanyl therapy for chronic pain management. Drug Deliv. Transl. Res. 13 , 2272–2285 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Cen, S., Gebregziabher, M., Moazami, S., Azevedo, C. J. & Pelletier, D. Toward precision medicine using a “digital twin” approach: modeling the onset of disease-specific brain atrophy in individuals with multiple sclerosis. Sci. Rep. 13 , 16279 (2023).

Maleki, A. et al. Moving forward through the in silico modeling of multiple sclerosis: Treatment layer implementation and validation. Comput. Struct. Biotechnol. J. 21 , 3081–3090 (2023).

Susilo, M. E. et al. Systems-based digital twins to help characterize clinical dose–response and propose predictive biomarkers in a Phase I study of bispecific antibody, mosunetuzumab, in NHL. Clin. Transl. Sci. 16 , 1134–1148 (2023).

Thangaraj, P. M., Vasisht Shankar, S., Oikonomou, E. K. & Khera, R. RCT-Twin-GAN Generates Digital Twins of Randomized Control Trials Adapted to Real-world Patients to Enhance their Inference and Application. medRxiv : the preprint server for health sciences , https://doi.org/10.1101/2023.12.06.23299464 (2023).

Jiang, J., Li, Q. & Yang, F. TCM Physical Health Management Training and Nursing Effect Evaluation Based on Digital Twin. Sci. Progr. 2022 , https://doi.org/10.1155/2022/3907481 (2022).

Tardini, E. et al. Optimal treatment selection in sequential systemic and locoregional therapy of oropharyngeal squamous carcinomas: deep Q-learning with a patient-physician digital twin dyad. J. Med. Int. Res. 24 , e29455 (2022).

Google Scholar  

Golse, N. et al. Predicting the risk of post-hepatectomy portal hypertension using a digital twin: A clinical proof of concept. J. Hepatol. 74 , 661–669 (2021).

Cho, S.-W. et al. Sagittal relationship between the maxillary central incisors and the forehead in digital twins of korean adult females. J. Personal. Med. 11 , https://doi.org/10.3390/jpm11030203 (2021).

Imoto, S., Hasegawa, T. & Yamaguchi, R. Data science and precision health care. Nutr. Rev. 78 , 53–57 (2020).

Drummond, D. & Coulet, A. Technical, ethical, legal, and societal challenges with digital twin systems for the management of chronic diseases in children and young people. J. Med. Internet Res. 24 , e39698 (2022).

Bertezene, S. The digital twin in health: Organizational contributions and epistemological limits in a context of health crisis. Med. Sci. M/S 38 , 663–668 (2022).

Johnson, K. B. et al. Precision Medicine, AI, and the Future of Personalized Health Care. Clin. Transl. Sci. 14 , 86–93 (2021).

Powell, J. & Li, X. Integrated, data-driven health management: A step closer to personalized and predictive healthcare. Cell Syst. 13 , 201–203 (2022).

Article   CAS   PubMed   Google Scholar  

Delpierre, C. & Lefèvre, T. Precision and personalized medicine: What their current definition says and silences about the model of health they promote. Implication for the development of personalized health. Front. Sociol. 8 , 1112159 (2023).

Raiff, B. R., Burrows, C. & Dwyer, M. Behavior-analytic approaches to the management of diabetes mellitus: current status and future directions. Behav. Anal. Pract. 14 , 240–252 (2021).

Ahern, D. K. et al. Behavior-based diabetes management: impact on care, hospitalizations, and costs. Am. J. Managed care 27 , 96–102 (2021).

Tyson, R. J. et al. Precision dosing priority criteria: drug, disease, and patient population variables. Front. Pharmacol. 11 , 420 (2020).

Walton, R., Dovey, S., Harvey, E. & Freemantle, N. Computer support for determining drug dose: systematic review and meta-analysis. BMJ (Clin. Res.) 318 , 984–990 (1999).

Article   CAS   Google Scholar  

Friedrichs, M. & Shoshi, A. History and future of KALIS: Towards computer-assisted decision making in prescriptive medicine. J. Integr. Bioinform. 16 , https://doi.org/10.1515/jib-2019-0011 (2019).

Zhao, H. et al. Identifying the serious clinical outcomes of adverse reactions to drugs by a multi-task deep learning framework. Commun. Biol. 6 , 870 (2023).

Thiong’o, G. M. & Rutka, J. T. Digital twin technology: the future of predicting neurological complications of pediatric cancers and their treatment. Front. Oncol. 11 , 781499 (2021).

Sun, T., He, X., Song, X., Shu, L. & Li, Z. The digital twin in medicine: a key to the future of healthcare? Front. Med. 9 , 907066 (2022).

Sarp, S., Kuzlu, M., Zhao, Y. & Gueler, O. Digital twin in healthcare: a study for chronic wound management. IEEE J. Biomed. health Inform. 27 , 5634–5643 (2023).

Chu, Y., Li, S., Tang, J. & Wu, H. The potential of the Medical Digital Twin in diabetes management: a review. Front. Med. 10 , 1178912 (2023).

Barricelli, B. R., Casiraghi, E. & Fogli, D. A survey on digital twin: definitions, characteristics, applications, and design implications. IEEE Access 7 , 167653–167671 (2019).

Keller, R. et al. Digital behavior change interventions for the prevention and management of type 2 diabetes: systematic market analysis. J. Med. Internet Res. 24 , e33348 (2022).

Priesterroth, L., Grammes, J., Holtz, K., Reinwarth, A. & Kubiak, T. Gamification and behavior change techniques in diabetes self-management apps. J. diabetes Sci. Technol. 13 , 954–958 (2019).

Venkatesh, K. P., Raza, M. M. & Kvedar, J. C. Health digital twins as tools for precision medicine: Considerations for computation, implementation, and regulation. NPJ digital Med. 5 , 150 (2022).

Venkatesh, K. P., Brito, G. & Kamel Boulos, M. N. Health digital twins in life science and health care innovation. Annu. Rev. Pharmacol. Toxicol. 64 , 159–170 (2024).

Katsoulakis, E. et al. Digital twins for health: a scoping review. NPJ Digital Med. 7 , 77 (2024).

Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clin. Res. ed.) 372 , n71 (2021).

Barker, T. H. et al. Revising the JBI quantitative critical appraisal tools to improve their applicability: an overview of methods and the development process. JBI Evid. Synth. 21 , 478–493 (2023).

Manganello, J. & Blake, N. A study of quantitative content analysis of health messages in U.S. media from 1985 to 2005. Health Commun. 25 , 387–396 (2010).

Giannantonio, C. M. Content Analysis: An Introduction to Its Methodology, 2nd edition. Organ. Res. Methods 13 , 392–394 (2010).

Rigby, A. S. Statistical methods in epidemiology. v. Towards an understanding of the kappa coefficient. Disabil. Rehabilitation 22 , 339–344 (2000).

Lantz, C. A. & Nebenzahl, E. Behavior and interpretation of the kappa statistic: resolution of the two paradoxes. J. Clin. Epidemiol. 49 , 431–434 (1996).

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.

Author information

Authors and affiliations.

School of Nursing, Peking University, Beijing, China

Mei-di Shen

Department of Plastic and Reconstructive Microsurgery, China-Japan Union Hospital, Jilin University, Changchun, Jilin, China

Si-bing Chen & Xiang-dong Ding

You can also search for this author in PubMed   Google Scholar

Contributions

MD.SHEN contributed to the data collection, analysis and the manuscript writing. SB.CHEN contributed to the data collection and analysis. XD.DING contributed to the critical revision of the manuscript as well as the initial study conception. All authors read and approved the final manuscript, and jointly take responsibility for the decision to submit this work for publication.

Corresponding author

Correspondence to Xiang-dong Ding .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shen, Md., Chen, Sb. & Ding, Xd. The effectiveness of digital twins in promoting precision health across the entire population: a systematic review. npj Digit. Med. 7 , 145 (2024). https://doi.org/10.1038/s41746-024-01146-0

Download citation

Received : 29 January 2024

Accepted : 22 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1038/s41746-024-01146-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

advantages of using literature review

SGLT2 Inhibitors' Cardiovascular Benefits in Individuals Without Diabetes, Heart Failure, and/or Chronic Kidney Disease: A Systematic Review

Affiliations.

  • 1 Department of Clinical Pharmacy, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran.
  • 2 Cardiovascular Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
  • 3 Department of Statistics and Epidemiology, Faculty of Public Health, Tabriz University of Medical Sciences, Tabriz, Iran.
  • PMID: 37455561
  • DOI: 10.1002/jcph.2311

Despite the growing body of evidence regarding the beneficial cardiovascular effects of sodium-glucose cotransporter-2 (SGLT2) inhibitors, clinical data in individuals without diabetes, heart failure (HF), and/or chronic kidney disease (CKD) is limited. A systematic review of the literature was conducted in PubMed, Scopus, Web of Science, Cochrane Library, and Google Scholar, from database inception until May 4, 2023, to explore new evidence of SGLT2 inhibitors' cardiovascular benefits in individuals without diabetes, HF, and/or CKD. A total of 1156 individuals from 14 studies (13 randomized controlled trials and 1 nonrandomized study) were included. The results showed the benefits of SGLT2 inhibitors on blood pressure, weight, and body mass index in this population with an acceptable safety profile. The current evidence supports the potential role of SGLT2 inhibitors as primary prevention in individuals without diabetes, HF, and/or CKD. This review may shed light on the use of SGLT2 inhibitors in conditions such as stage A HF and metabolic syndrome. The literature trend is going toward uncovering SGLT2 inhibitors' role in stage B HF, different types of myocardial infarction, and cardiac arrhythmias.

Keywords: SGLT2 inhibitors; blood pressure; cardiovascular disease; heart failure; metabolic syndrome; myocardial infarction.

© 2023, The American College of Clinical Pharmacology.

Publication types

  • Systematic Review
  • Diabetes Mellitus, Type 2* / metabolism
  • Heart Failure* / drug therapy
  • Randomized Controlled Trials as Topic
  • Renal Insufficiency, Chronic* / drug therapy
  • Sodium-Glucose Transporter 2 Inhibitors* / pharmacology
  • Sodium-Glucose Transporter 2 Inhibitors* / therapeutic use
  • Sodium-Glucose Transporter 2 Inhibitors

IMAGES

  1. Advantages Of Book Reviews

    advantages of using literature review

  2. benefits of literature review for a research worker

    advantages of using literature review

  3. advantages of conducting literature review

    advantages of using literature review

  4. The Importance of Literature Review in Scientific Research Writing

    advantages of using literature review

  5. advantages of conducting literature review

    advantages of using literature review

  6. 10 Easy Steps: How to Write a Literature Review Example

    advantages of using literature review

VIDEO

  1. 3_session2 Importance of literature review, types of literature review, Reference management tool

  2. USTER® STATISTICS

  3. Using Literature for Theology

  4. Scientific Benefits of Reading Books

  5. How to Do a Good Literature Review for Research Paper and Thesis

  6. Value Leader LXG Harrow Power Plow

COMMENTS

  1. The Literature Review: A Foundation for High-Quality Medical Education Research

    Such work is outside the scope of this article, which focuses on literature reviews to inform reports of original medical education research. We define such a literature review as a synthetic review and summary of what is known and unknown regarding the topic of a scholarly body of work, including the current work's place within the existing knowledge. While this type of literature review may ...

  2. Literature review as a research methodology: An ...

    In the following paper, it will be argued that the potential for making theoretical and practical contributions using the literature review as a method will be advanced by clarifying what a literature review is, how it can be used, and what criteria should be used to evaluate its quality. The paper has several contributions. First, this paper separates between different types of review ...

  3. Conducting a Literature Review

    Identification of the experts on a particular topic. One of the additional benefits derived from doing the literature review is that it will quickly reveal which researchers have written the most on a particular topic and are, therefore, probably the experts on the topic. Someone who has written twenty articles on a topic or on related topics is more than likely more knowledgeable than someone ...

  4. The Advantage of Literature Reviews for Evidence-Based Practice

    A literature review reporting strategies to prevent type 2 diabetes among youth ( Brackney & Cutshall, 2015) is included and addresses the second priority to address obesity. The National Association of School Nurses (NASN) research priorities focus on the impact of school nursing in a number of areas. NASN also recommends systematic reviews as ...

  5. Guidance on Conducting a Systematic Literature Review

    Literature review is an essential feature of academic research. Fundamentally, knowledge advancement must be built on prior existing work. To push the knowledge frontier, we must know where the frontier is. By reviewing relevant literature, we understand the breadth and depth of the existing body of work and identify gaps to explore. By summarizing, analyzing, and synthesizing a group of ...

  6. Literature Review: The What, Why and How-to Guide

    Narrative review: The purpose of this type of review is to describe the current state of the research on a specific topic/research and to offer a critical analysis of the literature reviewed. Studies are grouped by research/theoretical categories, and themes and trends, strengths and weakness, and gaps are identified. The review ends with a conclusion section which summarizes the findings ...

  7. Conducting a Literature Review: Why Do A Literature Review?

    Besides the obvious reason for students -- because it is assigned! -- a literature review helps you explore the research that has come before you, to see how your research question has (or has not) already been addressed. You identify: core research in the field. experts in the subject area. methodology you may want to use (or avoid)

  8. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  9. How to Undertake an Impactful Literature Review: Understanding Review

    Literature reviews lay the foundation for academic investigations, especially for early career researchers. However, in the planning phase, we generally lack clarity on approaches, due to which a lot of review articles are rejected or fail to create a significant impact. The systematic literature review (SLR) is one of the important review methodologies which is increasingly becoming popular ...

  10. YSN Doctoral Programs: Steps in Conducting a Literature Review

    A literature review is an integrated analysis -- not just a summary-- of scholarly writings and other relevant evidence related directly to your research question. That is, it represents a synthesis of the evidence that provides background information on your topic and shows a association between the evidence and your research question.

  11. LibGuides: Dissertations

    There are advantages and disadvantages to any approach. The advantages of conducting a literature review include accessibility, deeper understanding of your chosen topic, identifying experts and current research within that area, and answering key questions about current research. The disadvantages might include not providing new information on the subject and, depending on the subject area ...

  12. Advantages and disadvantages of literature review

    Creation of new body of knowledge. One of the key advantages of literature review is that it creates new body of knowledge. Through careful evaluation and critical summarisation, researchers can create a new body of knowledge and enrich the field of study. Answers to a range of questions. Literature reviews help researchers analyse the existing ...

  13. The Systematic Literature Review: Advantages and Applications in

    A systematic literature review (SLR) "uses a specific methodology to produce a synthesis of available evidence in answer to a focused research question" (Bearman et al., 2012, p. 627; Cooper, Hedges, and Valentine, 2009 ). SLRs are also referred to as "integrative research syntheses". Their common required elements are a rules-driven, inclusive and transparent approach to the synthesis ...

  14. What is the purpose of a literature review?

    There are several reasons to conduct a literature review at the beginning of a research project: To familiarize yourself with the current state of knowledge on your topic. To ensure that you're not just repeating what others have already done. To identify gaps in knowledge and unresolved problems that your research can address.

  15. Literature Review

    Typically, a literature review concludes with a full bibliography of your included sources. Make sure you use the style guide required by your professor for this assignment. The purpose of a literature review is to collect relevant, timely research on your chosen topic, and synthesize it into a cohesive summary of existing knowledge in the field.

  16. Reviewing the literature

    Implementing evidence into practice requires nurses to identify, critically appraise and synthesise research. This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review. Literature reviews aim to answer focused questions to: inform professionals and patients of the best available ...

  17. Systematic reviews: Brief overview of methods, limitations, and

    Systematic reviews have grown in numbers since they first emerged in the field of medicine in the late 1970s. 4, 6 Other types of scholarly reviews are often mistitled as a systematic review. 4 Conceptual overlap and muddiness exist among the many types of reviews found in the literature. 6 According to research by Martinic, 7 a complicating factor is the lack of consensus for a standard ...

  18. Writing a literature review

    A formal literature review is an evidence-based, in-depth analysis of a subject. There are many reasons for writing one and these will influence the length and style of your review, but in essence a literature review is a critical appraisal of the current collective knowledge on a subject. Rather than just being an exhaustive list of all that has been published, a literature review should be ...

  19. The benefits and challenges of using systematic reviews in

    It is concluded that although using systematic review principles can help researchers improve the rigour and breadth of literature reviews, conducting a full systematic review is a resource-intensive process which involves a number of practical challenges.

  20. Types of Reviews and Their Differences

    Purpose: The reason or objective of the review. One review may be to see how much has been published on a topic (a scoping review) while another may to draw new conclusions by combining data from multiple yet similar studies (a meta-analysis). A student may do a review for an assignment, while a researcher could include a literature review as support in their grant proposal.

  21. A systematic literature review of empirical research on ...

    Over the last four decades, studies have investigated the incorporation of Artificial Intelligence (AI) into education. A recent prominent AI-powered technology that has impacted the education sector is ChatGPT. This article provides a systematic review of 14 empirical studies incorporating ChatGPT into various educational settings, published in 2022 and before the 10th of April 2023—the ...

  22. Sharpening the lens to evaluate interprofessional education and

    In answering these questions, we first show evidence from the literature that the existing causal models of IPE and IPC exhibit a crucial imprecision. Second, we present the "multi-stage multi-causality model of patient, healthcare provider, and system outcomes" which fixes this imprecision by making a small but important modification to the causal role of IPO. Third, we demonstrate the ...

  23. The effectiveness of digital twins in promoting precision health across

    The review coded three types of interventions: personalized health management, precision individual therapy effects, and predicting individual risk, leading to a total of 45 outcomes being measured.

  24. SGLT2 Inhibitors' Cardiovascular Benefits in Individuals Without

    Despite the growing body of evidence regarding the beneficial cardiovascular effects of sodium-glucose cotransporter-2 (SGLT2) inhibitors, clinical data in individuals without diabetes, heart failure (HF), and/or chronic kidney disease (CKD) is limited. A systematic review of the literature was conducted in PubMed, Scopus, Web of Science, Cochrane Library, and Google Scholar, from database ...

  25. Chondrosarcoma of the maxilla resected using anterior segmental

    A 69-year-old man with no relevant medical history and a 48-year history of consumption of 1 gou per day of sake (equivalent to about three standard drinks per day) and ten cigarettes per day presented at a local clinic with a 3-year history of swelling of the left nasal vestibule. The swelling was diagnosed as vestibulitis, but a follow-up 2 months later showed an increase in size. A computed ...

  26. A practical guide to data analysis in general literature reviews

    A general literature review starts with formulating a research question, defining the population, and conducting a systematic search in scientific databases, steps that are well-described elsewhere. 1, 2, 3 Once students feel confident that they have thoroughly combed through relevant databases and found the most relevant research on the topic, however, what is arguably the hardest part of the ...

  27. Buildings

    This review explores the use of Ultra-High Molecular Weight Polyethylene (UHMWPE) fiber cloth as an innovative solution for the repair and reinforcement of concrete structures. UHMWPE is a polymer formed from a very large number of repeated ethylene (C2H4) units with higher molecular weight and long-chain crystallization than normal high-density polyethylene. With its superior tensile strength ...