Adult learning online education:
Adult learning online education:
Adult learning online education:
About the example: Boolean searches were conducted on November 4, 2019; result numbers may vary at a later date. No additional database limiters were set to further narrow search returns.
Database strategies for targeted search results.
Most databases include limiters, or additional parameters, you may use to strategically focus search results. EBSCO databases, such as Education Research Complete & Academic Search Complete provide options to:
Keep in mind that these tools are defined as limiters for a reason; adding them to a search will limit the number of results returned. This can be a double-edged sword. How?
Use limiters with care. When starting a search, consider opting out of limiters until the initial literature screening is complete. The second or third time through your research may be the ideal time to focus on specific time periods or material (scholarly vs newspaper).
Expanding your search term at the root.
Truncating is often referred to as 'wildcard' searching. Databases may have their own specific wildcard elements however, the most commonly used are the asterisk (*) or question mark (?). When used within your search. they will expand returned results.
Using the asterisk wildcard will return varied spellings of the truncated word. In the following example, the search term education was truncated after the letter "t."
Original Search | |
adult education | adult educat* |
Results included: educate, education, educator, educators'/educators, educating, & educational |
Explore these database help pages for additional information on crafting search terms.
Tips for saving research directly to Google drive.
It is possible to save articles (PDF and HTML) and abstracts in EBSCOhost databases directly to Google drive. Select the Google Drive icon, authenticate using a Google account, and an EBSCO folder will be created in your account. This is a great option for managing your research. If documenting your research in a Google Doc, consider linking the information to actual articles saved in drive.
EBSCOHost Databases & Google Drive: Managing your Research
This video features an overview of how to use Google Drive with EBSCO databases to help manage your research. It presents information for connecting an active Google account to EBSCO and steps needed to provide permission for EBSCO to manage a folder in Drive.
About the Video: Closed captioning is available, select CC from the video menu. If you need to review a specific area on the video, view on YouTube and expand the video description for access to topic time stamps. A video transcript is provided below.
What is a literature review.
A definition from the Online Dictionary for Library and Information Sciences .
A literature review is "a comprehensive survey of the works published in a particular field of study or line of research, usually over a specific period of time, in the form of an in-depth, critical bibliographic essay or annotated list in which attention is drawn to the most significant works" (Reitz, 2014).
A systemic review is "a literature review focused on a specific research question, which uses explicit methods to minimize bias in the identification, appraisal, selection, and synthesis of all the high-quality evidence pertinent to the question" (Reitz, 2014).
EBSCO Connect [Discovery and Search]. (2022). Searching with boolean operators. Retrieved May, 3, 2022 from https://connect.ebsco.com/s/?language=en_US
EBSCO Connect [Discover and Search]. (2022). Searching with wildcards and truncation symbols. Retrieved May 3, 2022; https://connect.ebsco.com/s/?language=en_US
Machi, L.A. & McEvoy, B.T. (2009). The literature review . Thousand Oaks, CA: Corwin Press:
Reitz, J.M. (2014). Online dictionary for library and information science. ABC-CLIO, Libraries Unlimited . Retrieved from https://www.abc-clio.com/ODLIS/odlis_A.aspx
Ridley, D. (2008). The literature review: A step-by-step guide for students . Thousand Oaks, CA: Sage Publications, Inc.
Schedule an appointment.
Contact a librarian directly (email), or submit a request form. If you have worked with someone before, you can request them on the form.
The Ohio State University
What exactly is a literature review.
1. choose a clear research question., 2. use online databases and other resources to find articles and books relevant to your question..
7. interpret the results, using your experience and the literature’s quality and content. for a more detailed analysis, a meta-analysis can be conducted using statistical methods to combine study results., 8. produce a descriptive review or perform a meta-analysis..
References:
Bryman, A. (2007). Effective leadership in higher education: A literature review. Studies in higher education , 32 (6), 693-710.
Fink, A. (2019). Conducting research literature reviews: From the internet to paper . Sage publications.
Yu, Z. (2023). A meta-analysis of the effect of virtual reality technology use in education. Interactive Learning Environments, 31 (8), 4956-4976.
Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91–108. doi:10.1111/j.1471-1842.2009.00848.x
Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or mode | Seeks to identify most significant items in the field | No formal quality assessment. Attempts to evaluate according to contribution | Typically narrative, perhaps conceptual or chronological | Significant component: seeks to identify conceptual contribution to embody existing or derive new theory | |
Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings | May or may not include comprehensive searching | May or may not include quality assessment | Typically narrative | Analysis may be chronological, conceptual, thematic, etc. | |
Mapping review/ systematic map | Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature | Completeness of searching determined by time/scope constraints | No formal quality assessment | May be graphical and tabular | Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research |
Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results | Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness | Quality assessment may determine inclusion/ exclusion and/or sensitivity analyses | Graphical and tabular with narrative commentary | Numerical analysis of measures of effect assuming absence of heterogeneity | |
Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies | Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies | Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists | Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies | Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other | |
Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics | May or may not include comprehensive searching (depends whether systematic overview or not) | May or may not include quality assessment (depends whether systematic overview or not) | Synthesis depends on whether systematic or not. Typically narrative but may include tabular features | Analysis may be chronological, conceptual, thematic, etc. | |
Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies | May employ selective or purposive sampling | Quality assessment typically used to mediate messages not for inclusion/exclusion | Qualitative, narrative synthesis | Thematic analysis, may include conceptual models | |
Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research | Completeness of searching determined by time constraints | Time-limited formal quality assessment | Typically narrative and tabular | Quantities of literature and overall quality/direction of effect of literature | |
Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research) | Completeness of searching determined by time/scope constraints. May include research in progress | No formal quality assessment | Typically tabular with some narrative commentary | Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review | |
Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives | Aims for comprehensive searching of current literature | No formal quality assessment | Typically narrative, may have tabular accompaniment | Current state of knowledge and priorities for future investigation and research | |
Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review | Aims for exhaustive, comprehensive searching | Quality assessment may determine inclusion/exclusion | Typically narrative with tabular accompaniment | What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research | |
Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’ | Aims for exhaustive, comprehensive searching | May or may not include quality assessment | Minimal narrative, tabular summary of studies | What is known; recommendations for practice. Limitations | |
Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment | May or may not include comprehensive searching | May or may not include quality assessment | Typically narrative with tabular accompaniment | What is known; uncertainty around findings; limitations of methodology | |
Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results | Identification of component reviews, but no search for primary studies | Quality assessment of studies within component reviews and/or of reviews themselves | Graphical and tabular with narrative commentary | What is known; recommendations for practice. What remains unknown; recommendations for future research |
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Methodology
Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.
A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.
They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”
In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.
What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.
A review is an overview of the research that’s already been completed on a topic.
What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:
Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.
Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.
Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.
Professional editors proofread and edit your paper by focusing on:
See an example
Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.
A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .
A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.
Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.
Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.
However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.
Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.
A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.
To conduct a systematic review, you’ll need the following:
A systematic review has many pros .
Systematic reviews also have a few cons .
The 7 steps for conducting a systematic review are explained with an example.
Formulating the research question is probably the most important step of a systematic review. A clear research question will:
A good research question for a systematic review has four components, which you can remember with the acronym PICO :
You can rearrange these four components to write your research question:
Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .
Their research question was:
A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.
Your protocol should include the following components:
If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.
It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .
Searching for relevant studies is the most time-consuming step of a systematic review.
To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:
At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .
Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.
To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.
If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.
You should apply the selection criteria in two phases:
It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .
Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.
When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.
Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:
You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .
Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.
They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.
Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:
Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.
Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.
The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.
Your article should include the following sections:
To verify that your report includes everything it needs, you can use the PRISMA checklist .
Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.
In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Research bias
A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .
It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.
A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other academic texts , with an introduction , a main body, and a conclusion .
An annotated bibliography is a list of source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a paper .
A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved July 5, 2024, from https://www.scribbr.com/methodology/systematic-review/
Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, get unlimited documents corrected.
✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts
Researchers using qualitative methods tend to:
Image from https://www.editage.com/insights/qualitative-quantitative-or-mixed-methods-a-quick-guide-to-choose-the-right-design-for-your-research?refer-type=infographics
Qualitative Research: an operational description
Purpose : explain; gain insight and understanding of phenomena through intensive collection and study of narrative data
Approach: inductive; value-laden/subjective; holistic, process-oriented
Hypotheses: tentative, evolving; based on the particular study
Lit. Review: limited; may not be exhaustive
Setting: naturalistic, when and as much as possible
Sampling : for the purpose; not necessarily representative; for in-depth understanding
Measurement: narrative; ongoing
Design and Method: flexible, specified only generally; based on non-intervention, minimal disturbance, such as historical, ethnographic, or case studies
Data Collection: document collection, participant observation, informal interviews, field notes
Data Analysis: raw data is words/ ongoing; involves synthesis
Data Interpretation: tentative, reviewed on ongoing basis, speculative
Researchers using quantitative methods tend to:
Quantitative research: an operational description
Purpose: explain, predict or control phenomena through focused collection and analysis of numberical data
Approach: deductive; tries to be value-free/has objectives/ is outcome-oriented
Hypotheses : Specific, testable, and stated prior to study
Lit. Review: extensive; may significantly influence a particular study
Setting: controlled to the degree possible
Sampling: uses largest manageable random/randomized sample, to allow generalization of results to larger populations
Measurement: standardized, numberical; "at the end"
Design and Method: Strongly structured, specified in detail in advance; involves intervention, manipulation and control groups; descriptive, correlational, experimental
Data Collection: via instruments, surveys, experiments, semi-structured formal interviews, tests or questionnaires
Data Analysis: raw data is numbers; at end of study, usually statistical
Data Interpretation: formulated at end of study; stated as a degree of certainty
This page on qualitative and quantitative research has been adapted and expanded from a handout by Suzy Westenkirchner. Used with permission.
Images from https://www.editage.com/insights/qualitative-quantitative-or-mixed-methods-a-quick-guide-to-choose-the-right-design-for-your-research?refer-type=infographics.
For guidance related to choosing a review type, see:
Literature reviews are comprehensive summaries and syntheses of the previous research on a given topic. While narrative reviews are common across all academic disciplines, reviews that focus on appraising and synthesizing research evidence are increasingly important in the health and social sciences.
Most evidence synthesis methods use formal and explicit methods to identify, select and combine results from multiple studies, making evidence synthesis a form of meta-research.
The review purpose, methods used and the results produced vary among different kinds of literature reviews; some of the common types of literature review are detailed below.
Narrative (literature) review.
1. Adapted from:
Eldermire, E. (2021, November 15). A guide to evidence synthesis: Types of evidence synthesis. Cornell University LibGuides. https://guides.library.cornell.edu/evidence-synthesis/types
Nolfi, D. (2021, October 6). Integrative Review: Systematic vs. Scoping vs. Integrative. Duquesne University LibGuides. https://guides.library.duq.edu/c.php?g=1055475&p=7725920
Delaney, L. (2021, November 24). Systematic reviews: Other review types. UniSA LibGuides. https://guides.library.unisa.edu.au/SystematicReviews/OtherReviewTypes
"The integrative review method is an approach that allows for the inclusion of diverse methodologies (i.e. experimental and non-experimental research)." (Whittemore & Knafl, 2005, p. 547).
Scoping reviews are evidence syntheses that are conducted systematically, but begin with a broader scope of question than traditional systematic reviews, allowing the research to 'map' the relevant literature on a given topic.
YouTube Video 4 minutes, 45 seconds
Rapid reviews are systematic reviews that are undertaken under a tighter timeframe than traditional systematic reviews.
Please note you do not have access to teaching notes, a contemporary systematic literature review of equestrian tourism: emerging advancements and future insights.
Journal of Hospitality and Tourism Insights
ISSN : 2514-9792
Article publication date: 2 July 2024
Horse-based tourism stands at the intersection of cultural heritage, leisure activities, and eco-friendly travel, captivating enthusiasts and researchers alike with its diverse facets and impacts. This study examines the horse-based tourism literature to provide an overview of horse-based tourism publications.
Using a systematic literature review (SLR) method, pertinent journal articles published over the past 3 decades were retrieved and analyzed. Based on the review process, 44 papers were identified and analyzed by publication year, journal distribution, research method, and lead author. Using Leximancer software, a thematic analysis was undertaken to determine the major themes of horse-based tourism.
The findings revealed a rising trend of horse-based tourism articles and the appearance of an increasing number of studies in tourism-oriented journals. In addition, it was discovered that the majority of available studies are qualitative, whereas quantitative research is few and limited.
Our research establishes a foundational resource for future studies and scholarly discourse on the multifaceted contributions of horse-based tourism.
This study can assist decision-makers in understanding the potential of horse-based tourism in the sustainable development of destinations. Moreover, it provides clear direction on implementing appropriate strategies to manage horse-based tourism.
This study distinguishes itself as the inaugural comprehensive literature review encompassing the breadth of horse-based tourism publications and research domains. By pioneering this endeavor, we not only contribute a unique perspective to the existing body of knowledge in the field but also emphasize the vital role of horse-based tourism in fostering economic and social sustainability for the countries involved.
Rezapouraghdam, H. , Saydam, M.B. , Altun, O. , Roudi, S. and Nosrati, S. (2024), "A contemporary systematic literature review of equestrian tourism: emerging advancements and future insights", Journal of Hospitality and Tourism Insights , Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JHTI-01-2024-0046
Emerald Publishing Limited
Copyright © 2024, Emerald Publishing Limited
All feedback is valuable.
Please share your general feedback
Contact Customer Support
Warning: The NCBI web site requires JavaScript to function. more...
An official website of the United States government
The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Galdas P, Darwin Z, Fell J, et al. A systematic review and metaethnography to identify how effective, cost-effective, accessible and acceptable self-management support interventions are for men with long-term conditions (SELF-MAN). Southampton (UK): NIHR Journals Library; 2015 Aug. (Health Services and Delivery Research, No. 3.34.)
Chapter 2 quantitative review methods.
A systematic review and meta-analysis was conducted based upon a protocol published on the PROSPERO database (registration number CRD42013005394, URL: www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42013005394 ).
Deviations from the original protocol are presented in Box 1 .
Deviations from original PROSPERO protocol The target population are male adults (aged 18 years or over) living with one or more long-term conditions.
We searched the following databases using a search strategy developed in conjunction with an information specialist from the Centre for Reviews and Dissemination, University of York (see Appendix 1 ): Cochrane Database of Systematic Reviews (CDSR); Database of Abstracts of Reviews of Effects (DARE) (up to July 2013); PROSPERO (International Prospective Register of Systematic Reviews) (up to July 2013); and Medical Literature Analysis and Retrieval System Online (MEDLINE) (January 2012 to July 2013). The breadth of the literature identified meant we took a pragmatic approach and limited our search to CDSR; see Box 1 .
Randomised controlled trials (RCTs) investigating self-management support interventions in men with LTCs (identified via Cochrane systematic reviews of self-management support interventions) were included. Studies which analysed the effects of self-management support interventions in sex groups within a RCT were also identified and synthesised separately.
The following population, intervention, comparison and outcome criteria were used:
An intervention primarily designed to develop the abilities of patients to undertake management of health conditions through education, training and support to develop patient knowledge, skills or psychological and social resources.
Criteria for defining a self-management support intervention The intervention should, through some means of education, training or support, help people with a LTC by:
We piloted the screening criteria on a sample of papers before undertaking the main screening, in order to identify and resolve any inconsistencies. Screening was conducted in two phases:
For phase 1, an initial screen by title and abstract was conducted by one researcher. Two researchers then screened each article independently according to the screening criteria to identify relevant systematic reviews. Disagreements were resolved by a third researcher (principal investigator) as required.
For phase 2, each Cochrane review was screened independently for eligible RCTs by two researchers. The eligibility of each RCT was checked using the study information presented within Cochrane reviews before full papers were sourced. Full texts of each RCT were independently screened by two researchers and disagreements were resolved by a third researcher (principal investigator) as required.
For this review we focused on identifying male-only RCTs and trials which analysed the effects of interventions by sex groups. Agreement on Cochrane review eligibility was 89% and agreement on male-only RCT inclusion/exclusion and identification of RCTs containing sex group analyses was > 90%.
We designed a data extraction sheet and piloted this on a sample of papers prior to the main data extraction. Relevant data from each included article were extracted by a member of the review team and checked for completeness and accuracy by a second member of the team. Disagreements were discussed and resolved by a third person (principal investigator) as required. In instances where key information for meta-analysis was missing, efforts were made to contact authors. We extracted data on study and population characteristics, intervention details (setting, duration, frequency, individual/group, delivered by), outcome measures of health status, clinical measures, health behaviour, health-care use, self-efficacy, knowledge and understanding, communication with HCPs and items for quality assessment (Cochrane risk of bias tool 35 ). Items for economic evaluations [hospital admission, service use, health-related quality of life (HRQoL), incremental cost-effectiveness ratios] were also extracted.
Where studies were reported in multiple publications, each publication was included and relevant data were extracted.
We extracted data on the methodological quality of all included male-only RCTs and appraised this using the Cochrane risk of bias tool. Quality appraisal was undertaken by two researchers independently and disagreements were resolved through discussion. Sequence generation, allocation concealment, blinding, incomplete outcome data, selective outcome reporting and other sources of bias were assessed, assigning low, high or unclear risk of bias, as appropriate. The purpose of the quality appraisal was to describe the quality of the evidence base, not to give an inclusion/exclusion criterion.
Randomised controlled trials containing sex group analyses were assessed for quality using assessment criteria adapted from Pincus et al. 36 and Sun et al. 37 ‘Yes’, ‘No’ and ‘Unclear’ were recorded as responses to the following quality appraisal questions:
Meta-analysis was conducted using Review Manager version 5.2 (The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, Denmark).
Data were extracted, analysed and presented as standardised mean difference (SMD) to account for the different instruments used, unless otherwise stated. As a guide to the magnitude of effect, we categorised an effect size of 0.2 as representing a ‘small’ effect, 0.5 a ‘moderate’ effect and 0.8 a ‘large’ effect. 38
A random-effects model was used to combine study data. Statistical heterogeneity was assessed with the I 2 value, with ‘low’ heterogeneity set at ≤ 25%, ‘moderate’ 50% and ‘high’ 75%.
In instances where studies contained multiple intervention groups, each group was extracted and analysed independently, dividing the control group sample size to avoid double counting in the analysis.
The following outcome measures were used in the analysis where possible: HRQoL, depression, anxiety, fatigue, stress, distress, pain and self-efficacy. Where a study contained more than one measure of a particular outcome (e.g. depression measured by the Centre for Epidemiologic Studies Depression Scale 39 and Beck Depression Inventory 40 ), the tool most established in the wider literature was chosen for meta-analysis. If the tool had multiple subscales, a judgement was made about the most relevant subscale. Where studies reported at multiple time periods, outcome measures reported at or closest to 6 months were used, as measures around this time were by far the most frequently reported.
Unless otherwise specified in the results section, positive effect sizes indicate beneficial outcomes for HRQoL and self-efficacy outcomes, while negative effect sizes indicate beneficial outcomes for depression, anxiety, fatigue, stress, distress and pain outcomes.
We conducted four types of analysis, described below.
Analysis 1 sought to determine whether studies in males show larger, similar or smaller effects than studies in females and mixed-sex groups within interventions included within the ‘parent’ Cochrane review. We screened all included Cochrane reviews of self-management support interventions to identify those that contained analysis on outcomes of interest and at least two relevant male-only RCTs. Where an eligible review was identified that met these criteria, the studies were categorised as male only, mixed sex and female only ( Figure 1 ).
Analysis 1: ‘within-Cochrane review analysis’.
Such comparisons across trials do not have the protection of randomisation, and there may be differences between the studies included in each sex group which account for differences in effects between groups. We presented data on the comparability of these trials within these three categories, including the age of the included patient populations, and on the quality of the studies (using allocation concealment as an indicator of quality).
We report the effect size [together with significance and 95% confidence interval (CI)] of self-management support in each sex group (male only, mixed sex, female only). We conducted analyses to test whether or not interventions showed significantly different effects in sex groups. It should be noted that the power to detect significant differences in such analyses can be limited.
Analysis 2 sought to determine whether studies in males show larger, similar or smaller effects than studies in females and mixed-sex groups within types of self-management support pooled across reviews.
In analysis 2, data were pooled according to broad intervention type across reviews, rather than within individual reviews as in analysis 1 ( Figure 2 ). This allowed us to determine whether broad types/components of self-management support interventions show larger, similar or smaller effects in males than in females and mixed populations. Limitations in the data meant that we were able to conduct analyses on only physical activity, education, peer support, and HCP monitoring and feedback interventions.
Analysis 2: ‘across-Cochrane review analysis’.
We report the effect size (together with significance and 95% CI) of self-management support in each sex group (male only, mixed sex, female only). We conducted analyses to test whether or not interventions showed significantly different effects in sex groups. It should be noted that the power to detect significant differences in such analyses can be limited.
We conducted a meta-analysis on trials including males only, according to broad intervention type – physical activity, education, peer support, and HCP monitoring and feedback – and compared effects between intervention types ( Figure 3 ). This allowed us to determine whether or not certain broad categories of self-management support intervention were effective in men.
Analysis 3: ‘male-only intervention type analyses’.
We identified RCTs which analysed the effects of self-management support interventions in sex groups. We sought to extract relevant data on the direction and size of moderating effects in secondary analysis (i.e. whether males show larger, similar or smaller effects than females), and assess these effects in the context of relevant design data, such as sample size, and the quality of the secondary analysis ( Figure 4 ).
Analysis 4: ‘within-trial sex group analysis’.
Sex group analyses within trials do in theory provide greater comparability in terms of patient and intervention characteristics than analyses 1–3.
A mixture of LTCs was included within each analysis, constituting the main analysis. Although this was not in the original protocol, we attempted to conduct an analysis by each disease area. We found there were sufficient data to conduct a sex-comparative analysis in only cancer studies; the results are presented in Appendix 2 .
The plan to use the behavioural change techniques (BCT) taxonomy was dropped (see Box 1 on protocol deviations). Post hoc, we took a pragmatic approach to coding interventions. Development of the intervention categories was informed by the published literature identified in this project and previous work conducted by the PRISMS and RECURSIVE project teams. 7 , 33 Table 1 provides a list of the categories and their associated description. Categories were designed to be broadly representative of the interventions identified and facilitate comparison of intervention types in the analysis. Two members of the review team independently assessed the ‘type’ of self-management support intervention in each study in order to categorise it, and disagreements were identified and resolved by discussion with a team member.
Self-management support intervention categories and description
The review of cost-effectiveness studies was initially planned as a two-stage review. First, we would review economic evaluations of self-management interventions on males only. Subsequently, we would review all economic evaluations with group analyses in which the costs and effects for males and females could be separated.
Study quality was assessed using a modified version of the Drummond checklist where appropriate. 45
We identified a total of 40 RCTs on self-management support interventions conducted in male-only samples (some trials have more than one reference) ( Figure 5 ). The majority of the studies were conducted in the USA ( n = 23), 46 – 70 with the remainder conducted in the UK ( n = 6), 71 – 78 Canada ( n = 5), 79 – 83 Spain ( n = 3), 84 – 88 Sweden ( n = 1), 89 Poland ( n = 1) 90 and Greece ( n = 1). 91 Males with prostate cancer were the most frequently studied male-only population ( n = 15) included in this review. 48 , 49 , 52 , 58 , 59 , 61 , 64 – 66 , 68 , 69 , 72 , 78 , 80 , 89 Other disease areas included hypertension ( n = 6), 47 , 71 , 79 , 82 , 83 , 85 , 86 COPD ( n = 6), 54 , 55 , 73 – 76 , 81 , 84 , 87 , 88 heart failure ( n = 4), 62 , 67 , 90 , 91 type 2 diabetes ( n = 3), 46 , 50 , 51 , 70 diabetes of unspecified type ( n = 1), 56 arthritis ( n = 1) 63 and testicular cancer ( n = 1). 77 One multimorbidity study recruited obese men with type 2 diabetes and chronic kidney disease. 57 The age of participants ranged from 25 to 89 years and, where reported, ethnicity was predominantly white. Only one study reported socioeconomic status using a validated tool; 63 the majority of other publications included a description of education or annual income.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram for the quantitative review.
A total of 51 distinct self-management support interventions were reported across the 40 included male-only studies. Physical activity ( n = 16), 49 , 57 , 62 , 72 – 76 , 78 , 80 , 81 , 84 , 87 – 91 education ( n = 36), 46 – 55 , 58 – 61 , 63 – 67 , 70 – 72 , 77 , 79 – 81 , 83 – 88 peer support ( n = 17) 47 , 49 , 53 , 56 , 68 – 72 , 80 and HCP monitoring and feedback ( n = 25) 46 , 47 , 50 – 52 , 56 , 57 , 60 , 61 , 66 – 68 , 70 , 71 , 75 , 76 , 78 – 80 , 82 – 89 were the most frequently reported components of these interventions. Three interventions with a psychological component, 64 , 77 two interventions containing a financial incentive component 82 , 83 and one study containing an action plan component 19 were also identified.
Twenty-three of the interventions were aimed at individuals, 46 , 48 , 50 – 52 , 54 , 55 , 60 , 61 , 64 , 65 , 67 – 69 , 75 – 78 , 82 – 86 20 were aimed at groups 47 , 53 , 58 , 59 , 62 , 66 , 70 , 71 , 79 , 89 – 91 and the remainder used a mixed individual and group approach ( n = 6). 49 , 56 , 72 – 74 , 80 , 81 , 87 , 88 It was unclear what approach was used in two studies. 57 , 63 Over half of the interventions lasted 0–5 months ( n = 28), 47 , 53 , 58 – 64 , 67 – 69 , 71 – 80 , 85 , 86 12 interventions ranged between 6 and 11 months, 46 , 52 , 54 – 57 , 66 , 70 , 84 , 90 , 91 six interventions were 12 months or longer 49 , 65 , 81 , 82 , 84 , 87 , 88 and in five cases the total programme duration was unclear. 48 , 83 , 89
The mode of administration of the interventions varied. They included telephone-based support ( n = 6), 60 , 61 , 65 , 67 face-to-face delivery ( n = 21), 47 , 53 – 55 , 58 , 59 , 62 – 64 , 66 , 68 – 70 , 77 , 83 , 89 – 91 remote unsupervised activities ( n = 2), 75 , 76 , 78 a combination of face-to-face delivery and remote unsupervised activities ( n = 20), 46 – 51 , 57 , 71 – 74 , 79 – 82 , 84 – 89 and a combination of face-to-face delivery and telephone support ( n = 2). 52 , 56
In terms of setting, interventions were reported to be home-based ( n = 11), 46 , 52 , 60 , 61 , 65 , 67 , 75 , 76 , 78 at a non-home location such as a dedicated gym, pharmacy, hospital clinic, work, university laboratory, coffee shop or other community-based venue ( n = 12), 53 – 55 , 62 – 64 , 68 – 70 , 77 , 85 , 86 , 90 a combination of home and non-home-based venue ( n = 14) 48 – 51 , 56 , 57 , 72 – 74 , 79 – 84 , 87 , 88 or not clearly reported in the publication ( n = 14). 47 , 58 , 59 , 66 , 71 , 89 , 91
Half of the studies 79 – 82 , 46 , 48 – 51 , 53 , 56 , 58 , 59 , 66 , 70 , 72 , 78 , 84 , 87 , 88 reported on some aspect of compliance with the self-management intervention and most participants were followed up for 6 months or less ( n = 24) following participation in the intervention.
Table 2 provides an overview of study details and Table 3 includes detailed descriptions of the self-management support intervention.
Male-only study characteristics
Male-only studies: self-management support intervention characteristics
Study quality was assessed using the Cochrane risk of bias tool, 92 which covers six key domains: sequence generation, allocation concealment, blinding performance, incomplete outcome data, selective outcome reporting and other sources of bias.
Studies were often poorly reported, making judgements of quality difficult. With the exception of selective outcome reporting, the most frequent rating for all domains was an unclear risk of bias. For the selective outcome-reporting domain, a low risk of bias was most frequently reported assignment. Table 4 describes the risk of bias allocation for each study by each domain. Figure 6 presents a summary of the male-only study quality assessment findings.
Male-only study Cochrane risk of bias findings
Summary of male-only study Cochrane risk of bias findings.
Included under terms of UK Non-commercial Government License .
Other titles in this collection.
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on
Connect with NLM
National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894
Web Policies FOIA HHS Vulnerability Disclosure
Help Accessibility Careers
BMC Medical Education volume 24 , Article number: 727 ( 2024 ) Cite this article
Metrics details
Virtual patients (VPs) are widely used in health professions education. When they are well integrated into curricula, they are considered to be more effective than loosely coupled add-ons. However, it is unclear what constitutes their successful integration. The aim of this study was to identify and synthesise the themes found in the literature that stakeholders perceive as important for successful implementation of VPs in curricula.
We searched five databases from 2000 to September 25, 2023. We included qualitative, quantitative, mixed-methods and descriptive case studies that defined, identified, explored, or evaluated a set of factors that, in the perception of students, teachers, course directors and researchers, were crucial for VP implementation. We excluded effectiveness studies that did not consider implementation characteristics, and studies that focused on VP design factors. We included English-language full-text reports and excluded conference abstracts, short opinion papers and editorials. Synthesis of results was performed using the framework synthesis method with Kern’s six-step model as the initial framework. We appraised the quality of the studies using the QuADS tool.
Our search yielded a total of 4808 items, from which 21 studies met the inclusion criteria. We identified 14 themes that formed an integration framework. The themes were: goal in the curriculum; phase of the curriculum when to implement VPs; effective use of resources; VP alignment with curricular learning objectives; prioritisation of use; relation to other learning modalities; learning activities around VPs; time allocation; group setting; presence mode; VPs orientation for students and faculty; technical infrastructure; quality assurance, maintenance, and sustainability; assessment of VP learning outcomes and learning analytics. We investigated the occurrence of themes across studies to demonstrate the relevance of the framework. The quality of the studies did not influence the coverage of the themes.
The resulting framework can be used to structure plans and discussions around implementation of VPs in curricula. It has already been used to organise the curriculum implementation guidelines of a European project. We expect it will direct further research to deepen our knowledge on individual integration themes.
Peer Review reports
Virtual patients (VPs) are defined as interactive computer simulations of real-life clinical scenarios for the purpose of health professions training, education, or assessment [ 1 ]. Several systematic reviews have demonstrated that learning using VPs is associated with educational gains when compared to no intervention and is non-inferior to traditional, non-computer-aided, educational methods [ 2 , 3 , 4 ]. This conclusion holds true across several health professions, including medicine [ 3 , 5 ], nursing [ 6 ] and pharmacy [ 7 ]. The strength of VPs in health professions education lies in fostering clinical reasoning [ 4 , 6 , 8 ] and related communication skills [ 5 , 7 , 9 ]. At the same time, the research syntheses report high heterogeneity of obtained results [ 2 , 4 ]. Despite suggestions in the literature that VPs that are well integrated into curricula are more effective than loosely coupled add-ons [ 5 , 10 , 11 ], there is no clarity on what constitutes successful integration. Consequently, the next important step in the research agenda around VPs is to investigate strategies for effectively implementing VPs into curricula [ 9 , 12 , 13 ].
In the context of healthcare innovation, implementation is the process of uptaking a new finding, policy or technology in the routine practice of health services [ 14 , 15 , 16 ]. In many organisations, innovations are rolled out intuitively, which at times ends in failure even though the new tool has previously shown good results in laboratory settings [ 17 ]. A large review of over 500 implementation studies showed that better-implemented health promotion programs yield 2–3 times larger mean effect sizes than poorly implemented ones [ 18 ]. Underestimation of the importance and difficulty of implementation processes is costly and may lead to unjustified attribution of failure to the new product, while the actual problem is inadequate methods for integration of the innovation into practice [ 15 ].
The need for research into different ways of integrating computer technology into medical schools was recognised by Friedman as early as 1994 [ 19 ]. However, studies of the factors and processes of technology implementation in medical curricula have long been scarce [ 12 ]. While the terminology varies across studies, we will use the terms introduction, integration, incorporation , and implementation of VPs into curricula interchangeably. Technology adoption is the decision to use a new technology in a curriculum, and we view it as the first phase of implementation. In an early guide to the integration of VPs into curricula, Huwendiek et al. recommended, based on their experience, the consideration of four aspects relevant to successful implementation: blending face-to-face learning with on-line VP sessions; designing collaborative learning around VPs; allowing students flexibility in deciding when/where/how to learn with VPs; and constructively aligning learning objectives with suitable VPs and matched assessment [ 20 ]. In a narrative review of VPs in medical curricula, Cendan and Lok identified a few practices which are recommended for the use of VPs in curricula: filling gaps in clinical experience with standardised and safe practice, replacing paper cases with interactive models showing variations in clinical presentations, and providing individualised feedback based on objective observation of student activities. These authors also highlighted cost as a significant barrier to the implementation process [ 21 ]. Ellaway and Davies proposed a theoretical construct based on Activity Theory to relate VPs to their use and to link to other educational interventions in curricula [ 22 ]. However, a systematic synthesis of the literature on the identified integration factors and steps relevant to VP implementation is lacking.
The context of this study was a European project called iCoViP (International Collection of Virtual Patients; https://icovip.eu ) , which involved project partners from France, Germany, Poland, Portugal, and Spain and succeeded in creating a collection of 200 open-access VPs available in 6 languages to support clinical reasoning education [ 23 ]. Such a collection would benefit from being accompanied by integration guidelines to inform potential users on how to implement the collection into their curricula. However, guidelines require frameworks to structure the recommendations. Existing integration frameworks are limited in scope for a specific group of health professions, were created mostly for evaluation rather than guidance, or are theoretical or opinion-based, without an empirical foundation [ 24 , 25 , 26 ].
Inspired by the methodological development of qualitative literature synthesis [ 27 ], we decided to build a mosaic of the available studies in order to identify and describe what stakeholders believe is important when planning the integration of VPs into health professions curricula. The curriculum stakeholders in our review included students, teachers, curriculum planners, and researchers in health professions education. We aimed to develop a framework that would configure existing research on curriculum implementations, structure future practice guidelines, and inform research agendas in order to strengthen the evidence behind the recommendations.
Therefore, the research aim of this study was to identify and synthesise themes across the literature that, in stakeholders’ opinions, are important for the successful implementation of VPs in health professions curricula.
This systematic review is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework [ 28 ].
We selected studies whose main objective was to define, identify, explore, or evaluate a set of factors that, in the view of the authors or study participants, contribute to the successful implementation of VPs in curricula. Table 1 summarises the inclusion and exclusion criteria.
The curricula in which VPs were included targeted undergraduate health professions students, such as human medicine, dentistry, nursing, or pharmacy programs. We were interested in the perspectives of all possible stakeholders engaged in planning or directly affected by undergraduate health professions curricula, such as students, teachers, curriculum planners, course directors, and health professions education researchers. We excluded postgraduate and continuing medical education curricula, faculty development courses not specifically designed to prepare a faculty to teach an undergraduate curriculum with VPs, courses for patients, as well as education at secondary school level and below. Also excluded were alternative and complementary medicine programs and programs in which students do not interact with human patients, such as veterinary medicine.
Similar to the previous systematic review [ 4 ], we excluded from the review VP simulations that required non-standard computer equipment (like virtual reality headsets) and those in which the VP was merely a static case vignette without interaction or the VP was simulated by a human (e.g., a teacher answering emails from students as a virtual patient). We included studies in which VPs were presented in the context of health professions curricula; we excluded studies in which VPs were used as extracurricular activities (e.g., one-time learning opportunities, such as conference workshops) or merely as part of laboratory experimentation.
We included all studies that presented original research, and we excluded editorials and opinion papers. Systematic reviews were included in the first stage so we could manually search for references in order to detect relevant studies that had potentially been omitted. We included studies that aimed to comprehensively identify or evaluate external contextual factors relevant for the integration of VPs into curricula or that examined activities around VPs and the organisational, curricular and accreditation context (the constructed and framed layers of activities in Ellaway & Davies’ model [ 22 ]). As the goal was to investigate integration strategies, we excluded VP design studies that looked into techniques for authoring VPs or researched technical or pedagogical mechanisms encoded in VPs that could not be easily altered (i.e., encoded layer of VP activities [ 22 ]). As we looked into studies that comprehensively investigated a set of integration factors that are important in the implementation process, we excluded studies that focus on program effectiveness (i.e., whether or not a VP integration worked) but do not describe in detail how the VPs were integrated into curricula or investigate what integration factors contributed to the implementation process. We also excluded studies that focused on a single integration factor as our goal was to explore the broad perspective of stakeholders’ opinions on what factors matter in integration of VPs into curricula.
We only included studies published in English as we aimed to qualitatively analyse the stakeholders’ opinions in depth and did not want to rely on translations. We chose the year 2000 as the starting point for inclusion. We recognise that VPs were used before this date but also acknowledge the significant shift in infrastructure from offline technologies to the current web-based platforms, user-friendly graphical web browsers, and broadband internet, all of which appeared around the turn of the millennium. Additionally, VP literature before 2000 was mainly focused on demonstrating technology rather than integrating these tools into curricula [ 12 , 19 ].
We systematically searched the following five bibliographic databases: MEDLINE (via PubMed), EMBASE (via Elsevier), Educational Resource Information Center (ERIC) (via EBSCO), CINAHL Complete (via EBSCO), Web of Science (via Clarivate). The search strategies are presented in Supplementary Material S1 . We launched the first query on March 8, 2022, and the last update was carried out on September 25, 2023. The search results were imported into the Rayyan on-line software [ 29 ]. Duplicate items were removed. Each abstract was screened by at least two reviewers working independently. In the case of disagreement between reviewers, we included the abstract for full text analysis. Next, we downloaded the full text of the included abstracts, and pairs of reviewers analysed the content in order to determine whether they met the inclusion criteria. In the case of disagreement, a third reviewer was consulted to arbitrate the decision.
Reviewers working independently extracted relevant characteristics of the included studies to an online spreadsheet. We extracted such features as the country in which the study was conducted, the study approach, the data collection method, the year of implementation in the curriculum, the medical topic of the VPs, the type and number of participants, the number of included VPs, the type of VP software, and the provenance of the cases (e.g., self-developed, part of a commercial database or open access repository).
The qualitative synthesis followed the five steps of the framework synthesis method [ 27 , pp. 188–190]. In the familiarisation phase (step 1), the authors who were involved previously in the screening and data extraction process read the full text versions of the included studies to identify text segments containing opinions on how VPs should be implemented into curricula.
Next, after a working group discussion, we selected David Kern’s six-step curriculum development [ 30 ] for the pragmatic initial frame (step 2). Even though it is not a VP integration framework in itself, we regarded it as a “best fit” to configure a broad range of integration factors spanning the whole process of curriculum development. David Kern’s model is often used for curriculum design and reform and has also been applied in the design of e-learning curricula [ 31 ]. Through a series of asynchronous rounds of comments, on-line meetings and one face-to-face workshop that involved a group of stakeholders from the iCoViP project, we iteratively clustered the recommendations into the themes that emerged. Each theme was subsumed to one of Kern’s six-steps in the initial framework. Next, we formulated definitions of the themes.
In the indexing phase (step 3), two authors (JF and AK) systematically coded the results and discussion sections of all the included empirical studies, line-by-line, using the developed themes as a coding frame. Text segments grouped into individual themes were comparatively analysed for consistency and to identify individual topics within themes. Coding was performed using MaxQDA software for qualitative analysis (MaxQDA, version 22.5 [ 32 ]). Disagreements were discussed and resolved by consensus, leading to iterative refinements of the coding frame, clarifications of definitions, and re-coding until a final framework was established.
Subsequently, the studies were charted (step 4) into tables in order to compare their characteristics. Similar papers were clustered based on study design to facilitate closer comparisons. A quality appraisal of the included studies was then performed using a standardised tool. Finally, a visual representation of the framework was designed and discussed among the research team, allowing for critical reflection on the consistency of the themes.
In the concluding step (step 5), in order to ensure the completeness and representativeness of the framework for the analysed body of literature, we mapped the themes from the developed framework to the studies in which they were found, and we analysed how individual themes corresponded to the conceptual and implementation evaluation models identified during the review. We looked for patterns and attempted to interpret them. We also looked for inconsistencies and tensions in the studies to identify potential areas for future research.
To appraise the quality of the included studies, we selected the QuADS (Quality Assessment with Diverse Studies) tool [ 33 ], which is suitable for assessing the quality of studies with diverse designs, including mixed- or multi-method studies. This tool consists of 13 items on a four-point scale (0: not reported; 1: reported but inadequate; 2: reported and partially adequate; 3: sufficiently reported). QuADS has previously been successfully used in synthesis of studies in the field of health professions education [ 34 ] and technology-enhanced learning environments [ 35 ]. The included qualitative studies, quantitative surveys, and mixed-methods interview studies were independently assessed by two reviewers (JF, AK). The results were then compared; if differences arose, the justifications were discussed and a final judgement was reached by consensus. Following the approach taken by Goagoses et al. [ 35 ], we divided the studies into three groups, depending on the summary quality score: weak (≤ 49% of QuADS points); medium (50–69%) and high (≥ 70%) study quality.
The selection process for the included studies is presented in Fig. 1 .
PRISMA flowchart of the study selection process
Our search returned a total of 4808 items. We excluded duplicate records ( n = 2201), abstracts not meeting the inclusion criteria ( n = 2526), and complete reports ( n = 59) after full text analysis. In the end, 21 studies met our inclusion criteria.
In the analysis of the 21 included studies, 18 were classified as empirical studies, while three studies were identified as theoretical or evaluation models.
The purpose of the 18 empirical studies was to survey or directly observe the reaction of stakeholders to curriculum integration strategies in order to identify or describe the relevant factors (Table 2 ). Study types included qualitative ( n = 4) [ 11 , 36 , 37 , 38 ], mixed-methods ( n = 4) [ 39 , 40 , 41 , 42 ], quantitative survey ( n = 4) [ 10 , 43 , 44 , 45 ], and descriptive case studies ( n = 6) [ 46 , 47 , 48 , 49 , 50 , 51 ]. Data collection methods included questionnaires ( n = 9) [ 10 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 48 ], focus groups and small group interviews ( n = 8) [ 11 , 36 , 37 , 38 , 39 , 41 , 42 , 48 ], system log analyses ( n = 3) [ 44 , 47 , 48 ], direct observations ( n = 1) [ 44 ], or narrative descriptions of experiences with integration ( n = 5) [ 46 , 47 , 49 , 50 , 51 ]. The vast majority of studies reported experiences from integration of VPs into medical curricula ( n = 15). Two studies reported integration of VPs into nursing programs [ 40 , 51 ], one in a dentistry [ 40 ] and one in a pharmacy program [ 41 ]. One study was unspecific about the health professions program [ 46 ].
The remaining three of the included studies represented a more theoretical approach: one aimed to create a conceptual model [ 25 ]; the other two [ 24 , 26 ] presented evaluation models of the integration process (Table 3 ). We analysed them separately, considering their different structures, and we mapped the components of these models to our framework in the last stage of the framework synthesis.
The developed framework (Table 4 ), which we named the iCoViP Virtual Patient Curriculum Integration Framework (iCoViP Framework), contains 14 themes and 51 topic codes. The final version of the codebook used in the study can be found in Supplementary Material S2 . Below, we describe the individual themes.
In the Goal theme, we coded perceptions regarding appropriate general uses of VPs in curricula. This covers the competencies to be trained using VPs, but also unique strengths and limitations of VPs as a learning method that should influence decisions regarding their adoption in curricula.
A common opinion was that VPs should target clinical reasoning skills and subskills such as acquisition/organisation of clinical information, development of illness scripts (sign, symptoms, risk factors, knowledge of disease progress over time), patient-centred care (including personal preferences and cultural competencies in patient interaction) [ 11 , 36 , 37 , 38 , 39 , 40 , 42 , 43 , 44 , 45 , 46 , 49 , 50 , 51 ]. According to these opinions, a strength of VPs is their potential for self-directed learning in an authentic, practice-relevant, safe environment that gives opportunities for reflection and “productive struggle” [ 37 , 39 , 49 ]. VPs also make it possible for students to practise decision-making in undifferentiated patient cases and observe the development of disease longitudinally [ 45 ]. For instance, some students valued the potential of VPs as a tool that integrates basic knowledge with clinical application in a memorable experience:
We associate a disease more to a patient than to the textbook. If I saw the patient, saw the photo and questioned the patient in the program, I will remember more easily, I’ll have my flashback of that pathology more than if I only studied my class notes or a book. {Medical student, 4th year, Columbia} [ 36 ].
Another perceived function of VPs is to help fill gaps in curricula and clinical experiences [ 36 , 37 , 38 , 42 , 45 , 50 ]. This supporting factor for the implementation of VPs in curricula is particularly strong when combined with the need to meet regulatory requirements [ 42 ].
Varying opinions were expressed regarding the aim of VPs to represent rare diseases (or common conditions but with unusual symptoms) [ 43 , 48 ] versus common clinical pictures [ 37 , 40 ]. Another tension arose when considering whether VPs should be used to introduce new factual/conceptual knowledge versus serving as a knowledge application and revision tool:
The students, however, differed from leaders and teachers in assuming that VPS should offer a reasonable load of factual knowledge with each patient. More as a surprise came the participants’ preference for usual presentations of common diseases. [ 40 ].
Limitations of VPs were voiced when the educational goal was related to physical contact and hands-on training because, in some aspects of communication skills, physical examination, or application of medical equipment, VPs clearly have inferior properties to real patients, human actors or physical mannequins [ 36 , 51 ].
The Phase theme described the moment in curricula when the introduction of VPs was regarded as adequate. According to some opinions, VPs should be introduced early in curricula to provide otherwise limited exposure to real patients [ 39 , 43 ]:
Students of the pre-clinical years show a high preference in the adoption of VPs as learning activities. That could be explained from the lack of any clinical contact with real patients in their two first years of study and their willingness to have early, even virtual, clinical encounters. [ 43 ].
The tendency to introduce VPs early in curricula was confronted with the problem of students’ limited core knowledge as they were required to use VPs before they had learnt about the features of the medical conditions they were supposed to recognise [ 41 , 48 ]. At the other end of the time axis, we did not encounter opinions that specified when it would be too late to use VPs in curricula. Even final-year students stated that they preferred to augment their clinical experience with VPs [ 43 ].
In the Resources theme, we gathered opinions regarding the cost and assets required for the integration of VPs into curricula. Cost can be a barrier that, if not addressed properly, can slow down or even stop an implementation, therefore it should be addressed early in the implementation process. This includes monetary funds [ 42 ] and availability of adequately qualified personnel [ 38 ] and their time [ 47 ].
For instance, it was found that if a faculty member is primarily focused on clinical work, their commitment to introducing innovation in VPs will be limited and will tend to revert to previous practices unless additional resources are provided to support the change [ 38 ].
The Resources theme also included strategies to follow when there is only a limited number of resources to implement VPs in a curriculum. Some suggested solutions included the sharing of VPs with other institutions [ 50 ], the exchange of know-how on the implementation of VPs with more experienced institutions and networks of excellence [ 38 , 42 ], and increasing faculties’ awareness of the benefits of using VPs, also in terms of reduced workload after the introduction of VPs in curricula [ 38 ]. Finally, another aspect of this theme was the (lack of) awareness of the cost of implementing VPs in curricula across stakeholder groups [ 40 ].
The Alignment theme grouped utterances highlighting the importance of selecting the correct VP content for curricula and matching VPs with several elements of curricula, such as learning objectives, the content of VPs across different learning forms, as well as the need to adapt VPs to local circumstances. The selection criteria included discussion regarding the number of VPs [ 36 ], fine-grained learning objectives that could be achieved using VPs [ 42 , 50 ], and selection of an appropriate difficulty level, which preferably should gradually increase [ 11 , 49 ].
It was noticed that VPs can be used to systematically cover a topic. For example, they can align with implementation of clinical reasoning themes in curricula [ 38 ] or map a range of diseases that are characteristic of a particular region of interest, thereby filling gaps in important clinical exposure and realistically representing the patient population [ 36 ].
Several approaches were mentioned regarding the alignment of VPs with curricula that include the selection of learning methods adjusted to the type of learning objectives [ 45 ], introduction of VPs in small portions in relevant places in curricula to avoid large-scale changes [ 38 ], alignment of VP content with assessment [ 39 ], and the visibility of this alignment by explicitly presenting the specific learning objectives addressed by VPs [ 49 ]. It is crucial to retain cohesion of educational content across a range of learning modalities:
I worked through a VP, and then I went to the oncology ward where I saw a patient with a similar disease. After that we discussed the disease. It was great that it was all so well coordinated and it added depth and some [sic!] much needed repetition to the case. {Medical student, 5th year, Germany} [ 11 ].
We also noted unresolved dilemmas, such as whether to present VPs in English as the modern lingua franca to support the internationalisation of studies, versus the need to adapt VPs to the local native language of learners in order to improve accessibility and perceived relevance [ 50 ].
Several studies presented ideas for achieving higher Prioritisation of VPs in student agendas. The common but “heavy-handed” approach to increase motivation was to make completion of VPs a mandatory requirement to obtain course credits [ 36 , 48 , 51 ]. However, this approach was then often criticised for promoting superficial learning and lack of endorsement for self-directed learning [ 47 ]. Motivation was reported to increase when content was exam-relevant [ 11 ].
According to yet another mentioned strategy, motivation comes with greater engagement of teachers who intensively reference VPs in their classes and often give meaningful feedback regarding their use [ 40 ] or construct group activities around them [ 46 ]. It was suggested that VPs ought to have dedicated time for their use which should not compete with activities with obviously higher priorities, such as meeting real patients [ 37 ].
Another idea for motivation was adjustment of VPs to local needs, language and culture. It was indicated that it would be helpful to promote VPs’ authenticity by stressing the similarity of presented scenarios to problems clinicians encounter in clinical practice (e.g., using teacher testimonials [ 48 ]). Some students saw VPs as being more relevant when they are comprehensively described in course guides and syllabi [ 39 ]. The opinions about VPs that circulate among more-experienced students are also important:
Definitely if the year above kind of approves of something you definitely think you need it. {Medical student, 3rd year, UK} [ 39 ].
Peer opinion was also important for teachers, who were reported to be more likely to adopt VPs in their teaching if they have heard positive opinions from colleagues using them, know the authors of VP cases, or respect organisations that endorse the use of VP software [ 38 , 42 ]:
I was amazed because it was a project that seemed to have incredible scope, it was huge. I was impressed that there was the organization to really roll out and develop all these cases and have this national organization involved. {Clerkship director, USA} [ 42 ].
The Relation theme contained opinions about the connections between VPs and other types of learning activities. This theme was divided into preferences regarding which types of activities should be replaced or extended by VPs, and the relative order in which they should appear in curricula. We noticed general warnings that VPs should not be added on top of existing activities as this is likely to cause work overload for students [ 10 , 45 ]. The related forms of education that came up in the discussions were expository methods like lectures and reading assignments (e.g., textbooks, websites), small group discussions in seminars (e.g., problem-based learning [PBL] sessions, follow-up seminars), alternative forms of simulations (e.g., simulated patients, human patient simulators), clinical teaching (i.e., meeting with real patients and bedside learning opportunities), and preparation for assessments.
Lectures were seen as a form of providing core knowledge that could later be applied in VPs:
Working through the VP before attending the lecture was not as useful to me as attending the lecture before doing the VP. I feel I was able to get more out of the VP when I first attended the lecture in which the substance and procedures were explained. {Medical student, 5th year, Germany} [ 11 ].
Textbooks were helpful as a source of reference knowledge while solving VPs that enabled students to reflect while applying this knowledge in clinical context. Such a learning scenario was regarded impossible in front of real patients:
But here it’s very positive right now when we really don’t know everything about rheumatic diseases, that we can sit with our books at the same time as we have a patient in front of us. {Medical student, 3rd year, Sweden} [ 37 ].
Seminars (small group discussions) were perceived as learning events that motivate students to work intensively with VPs and as an opportunity to ask questions about them [ 11 , 46 , 47 ], with the warning that teachers should not simply repeat the content of VPs as this would be boring [ 44 ]. The reported combination of VPs with simulated patients made it possible to increase the fidelity of the latter by means of realistic representation of clinical signs (e.g., cranial nerve palsies) [ 48 ]. It was noticed that VPs can connect different forms of simulation, “turn[ing] part-task training into whole-task training” [ 46 ], or allow more thorough and nuanced preparation for other forms of simulation (e.g., mannequin-based simulation) [ 46 ]. A common thread in the discussion was the relation between VPs and clinical teaching [ 10 , 11 , 37 , 39 , 45 , 46 ]. The opinions included warnings against spending too much time with VPs at the expense of bedside teaching [ 37 , 51 ]. The positive role of VPs was highlighted in preparing for clinical experience or as a follow-up to meeting real patients because working with VPs is not limited by time and is not influenced by emotions [ 37 ].
Huwendiek et al. [ 11 ] suggested a complete sequence of activities which has found confirmation in some other studies [ 48 ]: lectures, VP, seminars and, finally, real patients. However, we also identified alternative solutions, such as VPs that are discussed between lectures as springboards to introduce new concepts [ 49 ]. In addition, some studies concluded that students should have the right to decide which form of learning they prefer in order to achieve their learning objectives [ 38 , 48 ], but this conflicts with limited resources, a problem the students seem not to consider when expressing their preferences.
In the Activities theme, we grouped statements about tasks constructed by teachers around VPs. This includes teachers asking questions to probe whether students have understood the content of VPs, and guiding students in their work with VPs [ 11 , 49 ]. Students were also expected to ask their teachers questions to clarify content [ 43 ]. Some educators felt that students trained using VPs ask too many questions instead of relying more on their clinical reasoning skills and asking fewer, but more pertinent questions [ 38 ].
Students were asked to compare two or more VPs with similar symptoms to recognise key diagnostic features [ 11 ] and to reflect on cases, discuss their decisions, and summarise VPs to their peers or document them in a standardised form [ 11 , 46 , 49 , 51 ]. Another type of activity was working with textbooks while solving VP cases [ 37 ] or following a standard/institutional checklist [ 51 ]. Finally, some students expected more activities around VPs and felt left alone to struggle with learning with VPs [ 37 ].
Another theme grouped stakeholders’ opinions regarding Time. A prominent topic was the time required for VP activities. Some statements provided the exact amount of time allocated to VP activities (e.g., one hour a week [ 51 ]), sometimes suggesting that it should be increased. There were several comments from students complaining about insufficient time allocated for VP activities:
There was also SO much information last week and with studying for discretionary IRATs constantly, I felt that I barely had enough time to synthesize the information and felt burdened by having a deadline for using the simulation. {Medical student, 2nd year, USA} [ 48 ].
Interestingly, the perceived lack of time was sometimes interpreted by researchers as a matter of students not assigning high enough priority to VP tasks because they do not consider them relevant [ 39 ].
Some students expected their teachers to help them with time management. Mechanisms for this included explicitly allocated time slots for work with VPs, declaration of the required time spent on working with VPs, and setting deadlines for task completion:
Without a time limit we can say: I’ll check the cases later, and then nothing happens; but if there’s a time limit, well, this week I see cardiac failure patients etc. It’s more practical for us and also for the teachers, I think. {Medical student, 4th year, Columbia} [ 36 ].
This expectation conflicts with the views that students should learn to self-regulate their activities, that setting a minimum amount of time that students should spend working with VPs will discourage them from doing more, and that deadlines cause an acute burst of activity shortly before them, but no activity otherwise [ 47 , 48 ].
Finally, it was interesting to notice that some educators and students perceived VPs as a more time-efficient way of collecting clinical experience than meeting real patients [ 37 , 38 ].
The Group theme included preferences for working alone or in a group. The identified comments revealed tensions between the benefits of working in groups, such as gaining new perspectives, higher motivation thanks to teamwork, peer support:
You get so much more from the situation when you discuss things with someone else, than if you would be working alone. {Medical student, 3rd year, Sweden} [ 37 ].
and the flexibility of working alone [ 43 , 44 , 46 , 49 ]. Some studies reported on their authors’ experiences in selection of group size [ 11 , 48 ]. It was also noted that smaller groups motivated more intensive work [ 41 , 44 ].
In the Presence theme, we coded preferences regarding whether students should work on VPs in a computer lab, a shared space, seminar rooms, or at home. Some opinions valued flexibility in selecting the place of work (provided a good internet connection is available) [ 11 , 36 ]. Students reported working from home in order to prepare well for work in a clinical setting:
... if you can work through a VP at home, you can check your knowledge about a certain topic by working through the relevant VP to see how you would do in a more realistic situation. {Medical student, 5th year, Germany} [ 11 ].
Some elements of courses related to simulated patient encounters had to be done during obligatory face-to-face training in a simulation lab (e.g., physical examination) that accompanied work with VPs [ 51 ]. Finally, it was observed that VPs offer sufficient flexibility to support different forms of blended learning scenarios [ 46 ]. Synchronous collaborative learning can be combined with asynchronous individual learning, which is particularly effective when there is a need for collaboration between geographically dispersed groups [ 46 ], for instance if a school has more than one campus.
In the Orientation theme, we included all comments that relate to the need for teacher training, the content of teacher training courses, and the form of preparation of faculty members and students for using VPs. Knowledge and skills mentioned as useful for the faculty were awareness about how VPs fit into curricula [ 42 ], small-group facilitation skills, clinical experience [ 11 ], and experience with online learning [ 38 ]. Teachers expected to be informed about the advantages/disadvantages and evidence of effectiveness of VPs [ 38 ]. For students, the following prerequisites were identified: the ability to operate VP tools and experience with online learning in general, high proficiency of the language in which the VPs are presented and, for some scenarios (e.g., learning by design), also familiarity with VP methodology [ 38 , 47 , 48 , 50 , 51 ]. It was observed that introduction of VPs is more successful when both teachers and students are familiar with the basics of clinical reasoning theory and explicit teaching methods [ 38 ].
Forms of student orientation that were also identified regarding the use of VPs included demonstrations and introductions at the start of learning units [ 42 ], handouts and email reminders, publication of online schedules for assigned VPs, and expected time to complete them [ 11 , 48 ].
The Infrastructure theme grouped stakeholders’ requirements regarding the technical environment in which VPs work. This included the following aspects: stable internet connection, secure login, usability of the user interface, robust software (well tested for errors and able to handle many simultaneous users), interoperability (e.g., support for the standardised exchange of VPs between universities) and access to an IT helpdesk [ 11 , 40 , 42 , 47 , 50 ]. It was noticed that technical glitches can have a profound influence on the perceived success of VP integration:
Our entire team had some technical difficulties, whether during the log-in process or during the patient interviews themselves and felt that our learning was somewhat compromised by this. {Medical student, 2nd year, USA} [ 48 ].
Sustainability & quality.
In the Sustainability & Quality theme, we indexed statements regarding the need to validate and update VP content, and its alignment with curricular goals and actual assessment to respond to changes in local conditions and regulatory requirements [ 45 ].
The need to add new cases to VP collections that are currently in use was mentioned [ 40 ]. This theme also included the requirement to evaluate students’ opinions on VPs using questionnaires, feedback sessions and observations [ 47 , 48 , 49 ]. Some of the stakeholders required evidence regarding the quality of VPs before they decided to adopt them [ 38 , 42 , 50 ]. Interestingly, it was suggested that awareness of the need for quality control of VPs varied between stakeholder groups, with low estimation of the importance of this factor among educational leaders:
Leaders also gave very low scores to both case validation and case exchange with other higher education institutions (the latter finding puts into perspective the current development of VPS interoperability standards). The leaders’ lack of interest in case validation may reflect a de facto conviction, that it is the ‘shell’ that validates the content. [ 40 ].
The Assessment theme encompasses a broad selection of topics related to various forms of using VPs in the assessment of educational outcomes related to VPs. This includes general comments on VPs as an assessment form, use of VPs in formative and summative assessment, as well as the use of learning analytics methods around VPs.
General topics identified in this theme included which learning objectives should be assessed with VPs, such as the ability to conduct medical diagnostic processes effectively [ 36 ], the authenticity of VPs as a form of examination [ 36 ], the use of VPs for self-directed assessment [ 11 , 39 , 43 , 46 ], and the emotions associated with assessment using VPs, e.g., reduced stress and a feeling of competitiveness [ 36 , 48 ].
Other topics discussed in the context of assessment included the pedagogical value of using VPs for assessments [ 36 ], such as the improved retention of information through reflection on diagnostic errors made with VPs [ 48 ], and VPs’ ability to illustrate the consequences of students’ errors [ 46 ]. Methods of providing feedback during learning with VPs were also described [ 11 ]. It was highlighted that data from assessments using VPs can aid teachers in planning future training [ 49 , 51 ]. Furthermore, it was observed that feedback from formative assessments with VPs motivates students to engage more deeply in their future learning [ 10 , 41 , 47 ]:
It definitely helped what we did wrong and what we should have caught, because there was a lot that I missed and I didn’t realize it until I got the feedback and in the feedback it also said where you would find it most of the time and why you would have looked there in the first place. {Pharmacy student, 4th year, Canada} [ 41 ].
In several papers [ 42 , 47 , 48 , 51 ], suggestions were made regarding the types of metrics that can be used to gauge students’ performance (e.g., time to complete tasks related to VPs, the accuracy of answers given in the context of VPs, recall and precision in selecting key features in the diagnostic process, the order of selecting diagnostic methods, and the quality of medical documentation prepared by students from VPs). The use of specific metrics and the risks associated with them were discussed. For instance, time spent on a task was sometimes seen as a metric of decision efficiency (a speed-based decision score) that should be minimised [ 48 ], or as an indicator of diligence in VP analysis that should be maximised [ 47 ]. Time measurements in on-line environments can be influenced by external factors like parallel learning using different methods (e.g. consulting a textbook) or interruptions unrelated to learning [ 47 ].
Finally, the analysed studies discussed summative aspects of assessment, including arguments regarding the validity of using VPs in assessments [ 51 ], the need to ensure alignment between VPs and examination content [ 49 ], and the importance of VP assessment in relation to other forms of assessment (e.g., whether it should be part of high-stakes examinations) [ 40 , 51 ]. The studies also explored forms of assessment that should be used to test students’ assimilation of content delivered through VPs [ 47 ], the challenges related to assessing clinical reasoning [ 38 ], and the risk of academic dishonesty in grading based on VP performance [ 48 ].
We mapped the occurrence of the iCoViP Framework themes across the included empirical studies, as presented in Fig. 2 .
Code matrix of the occurrence of themes in the included empirical studies
Table 5 displays a pooled number of studies in which each theme occurred. The three most frequently covered themes were Prioritisation , Goal , and Alignment . These themes were present in approx. 90% of the analysed papers. Each theme from the framework appeared in at least four studies. The least-common themes, present in fewer than one-third of studies, were Phase , Presence , and Resources .
We mapped the iCoViP Framework to the three identified existing theoretical and evaluation models (Fig. 3 ).
Mapping of the existing integration models to the iCoViP Framework
None of the compared models contained a category that could not be mapped to the themes from the iCoViP Framework. The model by Georg & Zary [ 25 ] covered the fewest themes from our framework, including only the common categories of Goal, Alignment, Activities and Assessment . The remaining two models by Huwendiek et al. [ 24 ] and Kleinheksel & Ritzhaupt [ 26 ] underpinned integration quality evaluation tools and covered the majority of themes (9 out of 14 each). There were three themes not covered by any of the models: Phase, Resources, and Presence .
The details of the quality appraisal of the empirical studies using the QuADS tool are presented in Supplementary Material S3 . The rated papers had medium (50–69%; [ 39 , 40 , 43 ]) to high quality (≥ 70%; [ 10 , 11 , 36 , 37 , 38 , 41 , 42 , 44 , 45 ]). Owing to the difficulty in identifying the study design elements in the included descriptive case studies [ 46 , 47 , 48 , 49 , 50 , 51 ], we decided against assessing their methodological quality with the QuADS tool. This difficulty can also be interpreted as indicative of the low quality of the studies in this group.
The QuADS quality criterion that was most problematic in the reported studies was the inadequate involvement of stakeholders in study design. Most studies reported the involvement of students or teachers only in questionnaire pilots, but not in the conceptualisation of the research. Another issue was the lack of explicit referral to the theoretical frameworks upon which the studies were based. Finally, in many of the studies, participants were selected using convenience sampling, or the authors did not report purposeful selection of the study group.
We found high-quality studies in qualitative, quantitative, and mixed-methods research. There was no statistical correlation between study quality and the number of topics covered. For sensitivity analysis, we excluded all medium-quality and descriptive studies from the analysis; this did not reduce the number of iCoViP Framework topics covered by the remaining high-quality studies.
In our study, we synthesised the literature that describes stakeholders’ perceptions of the implementation of VPs in health professions curricula. We systematically analysed research reports from a mix of study designs that provided a broad perspective on the relevant factors. The main outcome of this study is the iCoViP Framework, which represents a mosaic of 14 themes encompassing many specific topics encountered by stakeholders when reflecting on VPs in health professions curricula. We examined the prevalence of the identified themes in the included studies to justify the relevance of the framework. Finally, we assessed the quality of the analysed studies.
The significance of the developed framework lies in its ability to provide the health professions education community with a structure that can guide VP implementation efforts and serve as a scaffold for training and research in the field of integration of VPs in curricula. The developed framework was immediately applied in the structuring of the iCoViP Curriculum Implementation Guideline. This dynamic document, available on the website of the iCoViP project [ https://icovip.eu/knowledge-base ], presents the recommendations taken from the literature review and the project partners’ experiences with how to implement VPs, particularly the collection of 200 VPs developed during the iCoViP project [ 23 ]. To improve the accessibility of this guideline, we have added a glossary with definitions of important terms. We have already been using the framework to structure faculty development courses on the topic of teaching with VPs.
It is clear from our study that the success of integrating VPs into curricula depends on the substantial effort that is required of stakeholders to make changes in the learning environment to enable VPs to work well in the context of local health professions education programs. The wealth of themes discussed in the literature around VPs confirms what is known from implementation science: the quality of the implementation is as important as the quality of the product [ 15 ]. This might be disappointing for those who hope VPs are a turnkey solution that can be easily purchased to save time, under the misconception that implementation will occur effortlessly.
Our review also makes it evident that implementation of VPs is a team endeavour. Without understanding, acceptance and mutual support at all levels of the institutional hierarchy and a broad professional background, different aspects of the integration of VPs into curricula will not match. Students should not be left to their own devices when using VPs. They need to understand the relevance of the learning method used in a given curriculum by observing teachers’ engagement in the routine use of VPs, and they should properly understand the relationship between VPs and student assessment. Despite the IT-savviness of many students, they should be shown how and when to use VPs, while also allowing room for creative, self-directed learning. Finally, students should not get the impression that their use of VPs comes at the expense of something they give higher priority, such as direct patient contact or teacher feedback. Teachers facilitating learning with VPs should be convinced of their utility and effectiveness, and they need to know how to use VPs by themselves before recommending them to students. It is important that teachers are aware that VPs, like any other teaching resources, require quality control linked with perpetual updates. They should feel supported by more-experienced colleagues and an IT helpdesk if methodological or technical issues arise. Last but not least, curriculum managers should recognise the benefits and limitations of VPs, how they align with institutional goals, and that their adoption requires both time and financial resources for sustainment. All of this entails communication, coordinated efforts, and shared decision-making during the implementation of VPs in curricula.
Per Nilsen has divided implementation theories, models and frameworks into three broad categories: process models, determinant frameworks and evaluation models [ 16 ]. We view the iCoViP Framework primarily as a process model. This perspective originates from the initial framework we adopted in our systematic review, namely Kern’s 6-steps curriculum development process [ 30 ], which facilitates the grouping of curricula integration factors into discrete steps and suggests a specific order in which to address implementation tasks. Our intention in using this framework was also to structure how-to guidelines, which are another hallmark of process models. As already noted by Nilsen and as is evident in Kern’s model, implementation process models are rarely applied linearly in practice and require a pragmatic transition between steps, depending on the situation.
The boundary between the classes of implementation models is blurred [ 16 ] and there is significant overlap. It is therefore not surprising that the iCoViP framework can be interpreted through the lens of a determinant framework which configures many factors (facilitators and barriers) that influence VP implementation in curricula. Nilsen’s category of determinant frameworks includes the CFIR framework [ 52 ], which was also chosen by Kassianos et al. to structure their study included in this review [ 38 ]. A comparison of the themes emerging from their study and our framework indicates a high degree of agreement (as depicted in Fig. 2 ). We interpret this as a positive indication of research convergence. Our framework extends this research by introducing numerous fine-grained topic codes that are characteristic of VP integration into curricula.
The aim of our research was not to develop an evaluation framework. For this purpose, the two evaluation tools available in the literature by Huwendiek et al. [ 24 ] and Kleinheksel & Ritzhaupt [ 26 ] are suitable. However, the factors proposed in our framework can further inform and potentially extend existing or new tools for assessing VP integration.
Despite the plethora of available implementation science theories and models [ 16 ], their application in health professions curricula is limited [ 15 ]. The studies included in the systematic review only occasionally reference implementation sciences theories directly (exceptions are CFIR and UTAUT [ 38 ], Rogers’ Diffusion of Innovation Theory [ 26 , 42 ] and Surry’s RIPPLES model [ 42 ]). However, it is important to acknowledge that implementation science is itself an emerging field that is gradually gaining recognition. Furthermore, as noticed by Dubrowski & Dubrowski [ 17 ], the direct application of general implementation science models does not guarantee success and requires verification and adaptation.
This study is based on stakeholders’ perceptions of the integration of VPs into curricula. The strength of the evidence behind the recommendations expressed in the analysed studies is low from a positivist perspective as it is based on subjective opinions. However, by adopting a more interpretivist stance in this review, our goal is not to offer absolute, ready-to-copy recommendations. Instead, we aim to provide a framework that organises the implementation themes identified in the literature into accessible steps. It is beyond the scope of this review to supply an inventory of experimental evidence for the validity of the recommendations in each topic, as was intended in previous systematic reviews [ 4 ]. We recognise that, for some themes, it will always be challenging to achieve a higher level of evidence due to practical constraints in organising studies that experiment with different types of curricula. The complexity, peculiarities, and context-dependency of implementation likely preclude one-size-fits-all recommendations for VP integration. Nevertheless, even in such a situation, a framework for sorting through past experiences with integration of VPs proves valuable for constructing individual solutions that fit a particular context.
The aim of our study was to cover experiences from different health professions programs in the literature synthesis. However, with a few exceptions, the results show a dominance of medical programs in research on VP implementation in curricula. This, although beyond the authors’ control, limits the applicability of our review findings. The data clearly indicates a need for more research into the integration of VPs into health professions curricula other than medicine.
The decision to exclude single-factor studies from the framework synthesis is justified by our aim to provide a comprehensive overview of the integration process. Nevertheless, recommendations from identified single-factor studies [ 53 , 54 , 55 ] were subsequently incorporated into the individual themes in the iCoViP project implementation guideline. We did not encounter any studies on single factors that failed to align with any of the identified themes within the framework. Due to practical reasons concerning the review’s feasibility, we did not analyse studies in languages other than English and did not explore non-peer-reviewed grey literature databases. However, we recognise the potential of undertaking such activities in preparing future editions of the iCoViP guideline as we envisage this resource as an evolving document.
We acknowledge that our systematic review was shaped by the European iCoViP project [ 23 ]. However, we did not confine our study to just a single VP model, thereby encompassing a broad range of technical implementations. The strength of this framework synthesis lies in the diversity of its contributors affiliated with several European universities in different countries, who were at different stages of their careers, and had experience with various VP systems.
The iCoViP framework, by charting a map of themes around VP integration in health professions curricula, provides a foundation for further, more focused research on individual themes. The less-common themes or conflicts and inconsistencies in recommendations found in the literature synthesis may be a promising starting point.
An example of this is the phase of the curriculum into which a given VP fits. We see that proponents of early and late introduction of VPs use different arguments. The recommendation that VPs should be of increasing difficulty seems to be valid, but what is missing is the detail of what this means in practice. We envisage that this will be researched by exploring models of integration that cater for different levels of student expertise.
There are also varying opinions between those who see VPs as tools for presenting rare, intriguing cases, and those who see the commonality and practice relevance of the clinical problems presented in VPs as the most important factor. However, these opposing stances can be harmonised by developing a methodology to establish a well-balanced case-mix of VPs with different properties depending upon the needs of the learners and curricular context. Another point of division is the recognition of VPs as a tool for internationalising studies and supporting student mobility, versus the expectation that VPs should be adapted to local circumstances. These disparate beliefs can be reconciled by research into the design of activities around VPs that explicitly addresses the different expectations and confirm or refute their usefulness.
A significant barrier to the adoption of VPs is cost. While universities are occasionally willing to make a one-off investment in VPs for prestige or research purposes, the field needs more sustainable models. These should be suitable for different regions of the world and demonstrate how VPs can be maintained at a high level of quality in the face of limited time and resources. This is particularly important in low-resource countries and those affected by crises (e.g., war, natural disasters, pandemics), where the need for VPs is even greater than in developed countries due to the shortage of health professionals involved in teaching [ 56 ]. However, most of the studies included in our systematic review are from high-income countries. This shows a clear need for more research into the implementation of VPs in health professions curricula in developing countries.
Finally, an interesting area for future research is the interplay of different types of simulation modalities in curricula. The studies we reviewed do not recommend one type of simulation over another as each method has its unique advantages. In line with previous suggestions [ 46 ], we see a need for further research into practical implementation methods of such integrated simulation scenarios in curricula.
Stakeholders’ perceptions were structured into 14 themes by this framework synthesis of mixed methods studies on the curricular integration of VPs. We envision that teachers, course directors and curriculum designers will benefit from this framework when they decide to introduce VPs in their teaching. We anticipate that our summary will inspire health professions education researchers to conduct new studies that will deepen our understanding of how to effectively and efficiently implement VPs in curricula. Last but not least, we hope that our research will empower students to express their expectations regarding how they would like to learn with VPs in curricula, thus helping them to become better health professionals in the future.
All datasets produced and analysed during the current study are available from the corresponding author upon reasonable request.
International Collection of Virtual Patients
Quality Assessment with Diverse Studies
Liaison Committee on Medical Education (LCME) accreditation standard
Computer-assisted Learning in Paediatrics Program
Problem-Based Learning
Ellaway R, Poulton T, Fors U, McGee JB, Albright S. Building a virtual patient commons. Med Teach. 2008;30:170–4.
Article Google Scholar
Cook DA, Erwin PJ, Triola MM. Computerized virtual patients in Health professions Education: a systematic review and Meta-analysis. Acad Med. 2010;85:1589–602.
Consorti F, Mancuso R, Nocioni M, Piccolo A. Efficacy of virtual patients in medical education: a meta-analysis of randomized studies. Comput Educ. 2012;59:1001–8.
Kononowicz AA, Woodham LA, Edelbring S, Stathakarou N, Davies D, Saxena N, et al. Virtual Patient Simulations in Health Professions Education: systematic review and Meta-analysis by the Digital Health Education Collaboration. J Med Internet Res. 2019;21:e14676.
Lee J, Kim H, Kim KH, Jung D, Jowsey T, Webster CS. Effective virtual patient simulators for medical communication training: a systematic review. Med Educ. 2020;54:786–95.
Foronda CL, Fernandez-Burgos M, Nadeau C, Kelley CN, Henry MN. Virtual Simulation in nursing education: a systematic review spanning 1996 to 2018. Simul Healthc J Soc Simul Healthc. 2020;15:46–54.
Richardson CL, White S, Chapman S. Virtual patient technology to educate pharmacists and pharmacy students on patient communication: a systematic review. BMJ Simul Technol Enhanc Learn. 2020;6:332–8.
Plackett R, Kassianos AP, Mylan S, Kambouri M, Raine R, Sheringham J. The effectiveness of using virtual patient educational tools to improve medical students’ clinical reasoning skills: a systematic review. BMC Med Educ. 2022;22:365.
Kelly S, Smyth E, Murphy P, Pawlikowska T. A scoping review: virtual patients for communication skills in medical undergraduates. BMC Med Educ. 2022;22:429.
Berman N, Fall LH, Smith S, Levine DA, Maloney CG, Potts M, et al. Integration strategies for using virtual patients in clinical clerkships. Acad Med. 2009;84:942–9.
Huwendiek S, Duncker C, Reichert F, De Leng BA, Dolmans D, Van Der Vleuten CPM, et al. Learner preferences regarding integrating, sequencing and aligning virtual patients with other activities in the undergraduate medical curriculum: a focus group study. Med Teach. 2013;35:920–9.
Cook DA. The Research we still are not doing: an agenda for the study of computer-based learning. Acad Med. 2005;80:541–8.
Berman NB, Fall LH, Maloney CG, Levine DA. Computer-assisted instruction in Clinical Education: a Roadmap to increasing CAI implementation. Adv Health Sci Educ. 2008;13:373–83.
Eccles MP, Mittman BS. Welcome to implementation science. Implement Sci. 2006;1:1, 1748-5908-1–1.
Dubrowski R, Barwick M, Dubrowski A. I wish I knew this Before… an implementation science primer and model to Guide Implementation of Simulation Programs in Medical Education. In: Safir O, Sonnadara R, Mironova P, Rambani R, editors. Boot Camp Approach to Surgical Training. Cham: Springer International Publishing; 2018. pp. 103–21.
Chapter Google Scholar
Nilsen P. Making sense of implementation theories, models and frameworks. Implement Sci. 2015;10:53.
Dubrowski R, Dubrowski A. Why should implementation science matter in simulation-based health professions education? Cureus. 2018. https://doi.org/10.7759/cureus.3754 .
Google Scholar
Durlak JA, DuPre EP. Implementation matters: a review of research on the influence of implementation on program outcomes and the factors affecting implementation. Am J Community Psychol. 2008;41:327–50.
Friedman C. The research we should be doing. Acad Med. 1994;69:455–7.
Huwendiek S, Muntau AC, Maier EM, Tönshoff B, Sostmann K. E-Learning in Der Medizinischen Ausbildung: Leitfaden Zum Erfolgreichen Einsatz in Der Pädiatrie. Monatsschr Kinderheilkd. 2008;156:458–63.
Cendan J, Lok B. The use of virtual patients in medical school curricula. Adv Physiol Educ. 2012;36:48–53.
Ellaway RH, Davies D. Design for learning: deconstructing virtual patient activities. Med Teach. 2011;33:303–10.
Mayer A, Da Silva Domingues V, Hege I, Kononowicz AA, Larrosa M, Martínez-Jarreta B, et al. Planning a Collection of virtual patients to train clinical reasoning: a blueprint representative of the European Population. Int J Environ Res Public Health. 2022;19:6175.
Huwendiek S, Haider HR, Tönshoff B, Leng BD. Evaluation of curricular integration of virtual patients: development of a student questionnaire and a reviewer checklist within the electronic virtual patient (eVIP) project. Bio-Algorithms Med-Syst. 2009;5:35–44.
Georg C, Zary N. Web-based virtual patients in nursing education: development and validation of theory-anchored design and activity models. J Med Internet Res. 2014;16:e105.
Kleinheksel AJ, Ritzhaupt AD. Measuring the adoption and integration of virtual patient simulations in nursing education: an exploratory factor analysis. Comput Educ. 2017;108:11–29.
Gough D, Oliver S, Thomas J. An introduction to systematic reviews. SAGE; 2017.
Moher D, Liberati A, Tetzlaff J, Altman DG, for the PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339(jul21 1):b2535–2535.
Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:210.
Thomas PA, Kern DE, Hughes MT, Chen BY, editors. Curriculum development for medical education: a six-step approach. Third edition. Baltimore: Johns Hopkins University Press; 2016.
Chen BY, Kern DE, Kearns RM, Thomas PA, Hughes MT, Tackett S. From modules to MOOCs: application of the Six-Step Approach to Online Curriculum Development for Medical Education. Acad Med. 2019;94:678–85.
VERBI Software. MAXQDA 2022.5. Software. 2023. maxqda.com.
Harrison R, Jones B, Gardner P, Lawton R. Quality assessment with diverse studies (QuADS): an appraisal tool for methodological and reporting quality in systematic reviews of mixed- or multi-method studies. BMC Health Serv Res. 2021;21:144.
Opie JE, McLean SA, Vuong AT, Pickard H, McIntosh JE. Training of lived experience workforces: a Rapid Review of Content and outcomes. Adm Policy Ment Health Ment Health Serv Res. 2023;50:177–211.
Goagoses N, Suovuo T, Bgt, Winschiers-Theophilus H, Suero Montero C, Pope N, Rötkönen E, et al. A systematic review of social classroom climate in online and technology-enhanced learning environments in primary and secondary school. Educ Inf Technol. 2024;29:2009–42.
Botezatu M, Hult H, Fors UG. Virtual patient simulation: what do students make of it? A focus group study. BMC Med Educ. 2010;10:91.
Edelbring S, Dastmalchi M, Hult H, Lundberg IE, Dahlgren LO. Experiencing virtual patients in clinical learning: a phenomenological study. Adv Health Sci Educ. 2011;16:331–45.
Kassianos AP, Plackett R, Kambouri MA, Sheringham J. Educators’ perspectives of adopting virtual patient online learning tools to teach clinical reasoning in medical schools: a qualitative study. BMC Med Educ. 2023;23:424.
McCarthy D, O’Gorman C, Gormley G. Intersecting virtual patients and microbiology: fostering a culture of learning. Ulster Med J. 2015;84(3):173-8.
Botezatu M, Hult Hå, Kassaye Tessma M, Fors UGH. As time goes by: stakeholder opinions on the implementation and use of a virtual patient simulation system. Med Teach. 2010;32:e509–16.
Dahri K, MacNeil K, Chan F, Lamoureux E, Bakker M, Seto K, et al. Curriculum integration of virtual patients. Curr Pharm Teach Learn. 2019;11:1309–15.
Schifferdecker KE, Berman NB, Fall LH, Fischer MR. Adoption of computer-assisted learning in medical education: the educators’ perspective: adoption of computer-assisted learning in medical education. Med Educ. 2012;46:1063–73.
Dafli E, Fountoukidis I, Hatzisevastou-Loukidou C, D Bamidis P. Curricular integration of virtual patients: a unifying perspective of medical teachers and students. BMC Med Educ. 2019;19:416.
Edelbring S, Broström O, Henriksson P, Vassiliou D, Spaak J, Dahlgren LO, et al. Integrating virtual patients into courses: follow-up seminars and perceived benefit. Med Educ. 2012;46:417–25.
Lang VJ, Kogan J, Berman N, Torre D. The evolving role of online virtual patients in Internal Medicine Clerkship Education nationally. Acad Med. 2013;88:1713–8.
Ellaway R, Topps D, Lee S, Armson H. Virtual patient activity patterns for clinical learning. Clin Teach. 2015;12:267–71.
Hege I, Ropp V, Adler M, Radon K, Mäsch G, Lyon H, et al. Experiences with different integration strategies of case-based e-learning. Med Teach. 2007;29:791–7.
Hirumi A, Johnson T, Reyes RJ, Lok B, Johnsen K, Rivera-Gutierrez DJ, et al. Advancing virtual patient simulations through design research and interPLAY: part II—integration and field test. Educ Technol Res Dev. 2016;64:1301–35.
Kulasegaram K, Mylopoulos M, Tonin P, Bernstein S, Bryden P, Law M, et al. The alignment imperative in curriculum renewal. Med Teach. 2018;40:443–8.
Fors UGH, Muntean V, Botezatu M, Zary N. Cross-cultural use and development of virtual patients. Med Teach. 2009;31:732–8.
Kelley CG. Using a virtual patient in an Advanced Assessment Course. J Nurs Educ. 2015;54:228–31.
Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009;4:50.
Zary N, Johnson G, Fors U. Web-based virtual patients in dentistry: factors influencing the use of cases in the Web‐SP system. Eur J Dent Educ. 2009;13:2–9.
Maier EM, Hege I, Muntau AC, Huber J, Fischer MR. What are effects of a spaced activation of virtual patients in a pediatric course? BMC Med Educ. 2013;13:45.
Johnson TR, Lyons R, Kopper R, Johnsen KJ, Lok BC, Cendan JC. Virtual patient simulations and optimal social learning context: a replication of an aptitude–treatment interaction effect. Med Teach. 2014;36:486–94.
Mayer A, Yaremko O, Shchudrova T, Korotun O, Dospil K, Hege I. Medical education in times of war: a mixed-methods needs analysis at Ukrainian medical schools. BMC Med Educ. 2023;23:804.
Download references
The authors would like to thank Zuzanna Oleniacz and Joanna Ożga for their contributions in abstract screening and data extraction, as well as all the participants who took part in the iCoViP project and the workshops.
The study has been partially funded by the ERASMUS + program, iCoViP project (International Collection of Virtual Patients) from European Union grant no. 2020-1-DE01-KA226-005754 and internal funds from Jagiellonian University Medical College (N41/DBS/001125).
Authors and affiliations.
Center for Innovative Medical Education, Jagiellonian University Medical College, Medyczna 7, Krakow, 30-688, Poland
Joanna Fąferek
Faculty of Medicine, Paris Saclay University, Le Kremlin-Bicetre, 94270, France
Pierre-Louis Cariou & Luc Morin
Paracelsus Medical University, Prof.-Ernst-Nathan-Str. 1, 90419, Nürnberg, Germany
Medical Education Sciences, University of Augsburg, 86159, Augsburg, Germany
Institute and Clinic for Occupational, Social and Environmental Medicine, LMU University Hospital, 80336, Munich, Germany
Daloha Rodriguez-Molina
Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
Bernardo Sousa-Pinto
Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Medyczna 7, Krakow, 30-688, Poland
Andrzej A. Kononowicz
You can also search for this author in PubMed Google Scholar
JF and AK conceived the idea for the study. JF coordinated the research team activities. All authors contributed to the writing of the review protocol. AK designed the literature search strategies. All authors participated in screening and data extraction. JF retrieved and managed the abstracts and full-text articles. JF and AK performed qualitative analysis of the data and quality appraisal. AK, JF and IH designed the illustrations for this study. All authors interpreted the analysis and contributed to the discussion. JF and AK drafted the manuscript. PLC, IH, AM, LM, DRM, BSP read and critically commented on the manuscript. All authors gave final approval of the version submitted.
Correspondence to Joanna Fąferek or Andrzej A. Kononowicz .
Ethics approval and consent to participate.
Systematic review of literature - not applicable.
Not applicable.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Fąferek, J., Cariou, PL., Hege, I. et al. Integrating virtual patients into undergraduate health professions curricula: a framework synthesis of stakeholders’ opinions based on a systematic literature review. BMC Med Educ 24 , 727 (2024). https://doi.org/10.1186/s12909-024-05719-1
Download citation
Received : 20 March 2024
Accepted : 27 June 2024
Published : 05 July 2024
DOI : https://doi.org/10.1186/s12909-024-05719-1
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1472-6920
New citation alert added.
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Please log in to your account
Bibliometrics & citations, view options, recommendations, the empirical study of customer satisfaction and continued behavioural intention towards self-service banking: technology readiness as an antecedent.
Self-Service Banking (SSB) is one instance of the Self-Service Technologies (SSTs) in e-banking. This study examines consumer satisfaction and the long-term usage intention of SSB. In the proposed model, four dimensions of Technology Readiness (TR) (...
It is widely agreed that museums and other cultural heritage venues should provide visitors with personalised interaction and services such as personalised mobile guides, although currently most do not. Since museum visitors are typically first-time ...
On-demand home service application (HSA) is a technological advancement that has brought various day-to-day services to our doorstep with just a few clicks. By using consumer-perceived values (utilitarian, hedonic, and social), trust transfer theory, and ...
Published in.
Elsevier Science Publishers B. V.
Netherlands
Author tags.
Other metrics, bibliometrics, article metrics.
Login options.
Check if you have access through your login credentials or your institution to get full access on this article.
Share this publication link.
Copying failed.
Affiliations, export citations.
We are preparing your search results for download ...
We will inform you here when the file is ready.
Your file of search results citations is now ready.
Your search export query has expired. Please try again.
You have full access to this open access article
353 Accesses
Explore all metrics
While Federated Learning (FL) provides a privacy-preserving approach to analyze sensitive data without centralizing training data, the field lacks an detailed comparison of emerging open-source FL frameworks. Furthermore, there is currently no standardized, weighted evaluation scheme for a fair comparison of FL frameworks that would support the selection of a suitable FL framework. This study addresses these research gaps by conducting a comparative analysis of 15 individual open-source FL frameworks filtered by two selection criteria, using the literature review methodology proposed by Webster and Watson. These framework candidates are compared using a novel scoring schema with 15 qualitative and quantitative evaluation criteria, focusing on features, interoperability, and user friendliness. The evaluation results show that the FL framework Flower outperforms its peers with an overall score of 84.75%, while Fedlearner lags behind with a total score of 24.75%. The proposed comparison suite offers valuable initial guidance for practitioners and researchers in selecting an FL framework for the design and development of FL-driven systems. In addition, the FL framework comparison suite is designed to be adaptable and extendable accommodating the inclusion of new FL frameworks and evolving requirements.
Avoid common mistakes on your manuscript.
Federated Learning (FL) is a semi-distributed Machine Learning (ML) concept that has gained popularity in recent years, addressing data privacy concerns associated with centralized ML [ 1 , 2 , 3 , 4 , 5 , 6 , 7 ]. For example, data-driven applications with sensitive data such as in healthcare [ 8 , 9 , 10 , 11 , 12 ], finance [ 13 , 14 ], personalized IoT devices [ 15 , 16 ] or public service [ 17 , 18 ] require a technical guarantee of data privacy, which can be achieved by the use of FL.
In FL, a predefined number of clients with sensitive training data and a coordinator server jointly train a global model, while the local training data remains on the original client and is isolated from other clients [ 1 , 19 ]. In the FL training process, the global model is created by the server with randomly initialized weights and distributed to the clients of the FL system [ 20 , 21 ]. The goal of a federated training process is the minimization of the following objective function:
where \(N\) is the number of clients, \(n_k\) the amount of sensitive training data on client \(k\) , \(n\) the total amount of training data on all clients and \(F_k(w)\) is the local loss function [ 1 , 22 , 23 ]. Each client trains an initial model obtained by the coordinator server with the client’s local training data [ 24 ]). The locally updated model weights are asynchronously sent back to the coordinator server, where an updated global model is computed using an aggregation strategy such as Federated Averaging (FedAvg) [ 1 , 7 , 20 , 25 , 26 , 27 ]. The new global model is distributed back to the clients for a new federated training round. The number of federated training rounds is set in advance on the server side and is a hyperparameter that can be tuned [ 1 , 5 , 28 , 29 ]. An overview of the FL architecture is introduced in Fig. 1 . Also, FL can reduce the complexity and cost of model training by allowing a model to be trained on multiple smaller datasets on different clients, rather than on a single large, centralized dataset that requires an exhaustive data collection process beforehand [ 30 , 31 , 32 ]. Although there are a several key challenges to solve in the FL domain, security features such as homomorphic encryption [ 33 , 34 ] and differential privacy [ 6 , 35 , 36 ] are already used to guarantee and improve data privacy and security in FL systems [ 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ].
Basic FL architecture overview
The advent of FL has spurred the development of various FL frameworks aimed at facilitating the deployment of FL applications, offering standardized functionalities and enhanced usability. Despite the proliferation of these frameworks, the selection of an optimal FL framework for specific project requirements remains a non-trivial challenge for practitioners due to the diversity and complexity of the choices available. This situation is exacerbated by two notable deficiencies in the FL research literature: first, the absence of a methodologically rigorous, in-depth comparative analysis of the most relevant open-source FL frameworks; and second, the lack of a standardized, weighted scoring scheme for a systematic and objective evaluation of these frameworks.
To the best of our knowledge, this comparative study is the most thorough to date, assessing the widest array of open-source FL frameworks against the broadest spectrum of criteria. Consequently, this study endeavors to fill the aforementioned research gaps by providing a robust FL framework comparison suite that could serve as a research-based guide for practitioners navigating the selection of suitable FL frameworks for their projects.
This study provides a comprehensive and user targeted comparison of 15 open-source FL frameworks by performing a systematic literature review according to Webster and Watson [ 45 ]. In this way, relevant FL frameworks and comparison criteria are identified, which are the basis for the comperative analysis. A novel weighted scoring system is proposed for the evaluation of FL frameworks. The proposed comparison criteria and the scoring system in this study can be utilized by practitioners and researchers to determine whether a particular FL framework fulfills their needs. Thus, the major contributions of this study can be summarized as follows:
Proposing 15 comparison criteria for the evaluation of FL frameworks based on a methodological literature review.
Introducing a novel weighted scoring matrix for these comparison criteria.
Conducting an in-depth comparison of 15 relevant open-source FL Frameworks.
In addition, a Research Question (RQ) oriented approach is used in this study with the aim to answer the following three RQs:
RQ 1: Which relevant frameworks for FL exist and are open-source?
RQ 2: Which criteria enable a qualitative and quantitative comparison of FL frameworks?
RQ 3: Which FL framework offers the most added value to practitioners and researchers?
The RQs are addressed and answered in ascending order in Sect. 5.4 on page 16.
The remainder of this paper is organized as follows. Section 2 discusses relevant related work and shows how the main contribution of this paper differs from the others. Section 3 details the literature review methodology applied in this work. Section 4 briefly introduces inclusion criteria and the FL framework candidates. Section 5 presents and discusses the comparison criteria, the weighting schema and the scoring results from the conducted FL framework comparison analysis. Section 6 describes the limitations of this study and suggests future work. Finally, Sect. 7 draws the conclusions of this survey.
In recent years, several research papers have been published dealing with individual FL frameworks. Some developers published works detailing and highlighting their own FL frameworks. For instance, the developers of FedML [ 46 ], Sherpa.ai FL [ 47 ], IBM FL [ 48 ], OpenFL [ 49 ], FATE [ 50 ], Flower [ 51 ], FLUTE [ 52 ], FederatedScope [ 53 ], FedLab [ 54 ] and EasyFL [ 55 ] have all published white papers introducing the features of their released frameworks. These papers include a general introduction to FL, open FL challenges, and how their FL framework can address them, while [ 29 , 46 , 47 , 51 , 52 , 55 ] also provide small comparisons of a few existing FL frameworks. These comparisons were chosen subjectively and are biased, usually in favor of the FL framework developed by the author making the comparison, meaning a neutral, independent and holistic comparison is missing so far. In addition, there are research papers that address the current state of FL research, some of them using specific FL frameworks for technical implementation or evaluation purposes. For example, [ 5 ] showed a general and comprehensive overview of FL. They examined possible future research directions and challenges of FL, such as protection strategies against federated security attacks, and mentioned sources of federated bias. Moreover, they briefly introduced and described some popular FL frameworks, including FATE [ 56 ], PaddleFL [ 57 ], NVIDIA Clara (now a platform offering AI models for healthcare applications) [ 58 ], IBM FL [ 59 ], Flower [ 51 ] and FedLearner [ 60 ]. Another work [ 61 ] followed a similar approach as [ 5 ] and described central FL concepts such as the training process and FL algorithms in more detail before including a brief comparison overview of several FL frameworks. The authors of both works ( [ 5 ] and [ 61 ]) refrain from evaluating FL frameworks and drawing conclusions from their conducted comparison analyses. In contrast to the aforementioned works, [ 62 ] described an in-depth comparison of multiple FL frameworks (TFF [ 63 ], FATE [ 56 ], PySyft [ 64 ] PaddleFL [ 57 ], FL &DP [ 65 ]). Both qualitative (in the form of a table comparing features of the frameworks) and quantitative comparisons (in the form of experiments, measuring training time and accuracy for three classification problems) are performed. Based on their evaluations, [ 62 ] recommended PaddleFL for the industrial usage, citing its high test accuracy for model inference tasks and range of features ready for practical use. A similar qualitative and quantitative FL framework comparison is provided by [ 66 ]. Their comparison contained more FL framework candidates than in the comparison conducted by [ 62 ] (9 vs 5). Furthermore, [ 66 ] performed a larger set of benchmark experiments, in which different FL paradigms were considered. The qualitative comparison was of a similar scope as in [ 62 ], although some criteria were left out (e.g., supported data types and protocols) and others have been added (e.g., documentation availability and GPU Support). Although the authors did not make a recommendation for a particular FL framework, they described a general decision-making process that can be used to determine the most appropriate FL framework.
In contrast to previous works, where the selection of comparison criteria for FL frameworks was often arbitrary, our study introduces a methodologically rigorous approach for a comparative analysis of FL frameworks. Prior works did not incorporate weighted importance of criteria nor did they employ a scoring mechanism for a systematic evaluation of FL frameworks. In addition, there was a lack of comprehensiveness in the inclusion of available and pertinent open-source FL frameworks. Our work advances the field by encompassing a broader spectrum of framework candidates and employing a more integrative methodology for evaluating FL frameworks with a novel weighted scoring approach. Leveraging the structured literature review methodology by Webster and Watson, this comparative study identifies the most pertinent quantitative and qualitative criteria for FL framework users, ensuring a selection of comparison criteria that is both comprehensive and methodically sound, surpassing the scope of similar studies.
We applied the literature review methodology proposed by Webster and Watson [ 45 ] to address the RQs (see Sect. 1 on page 1). They introduced a systematic in-depth review schema for the identification and evaluation of relevant research literature. Webster and Watson’s literature review method was published in response to the lack of reviewing articles in the information systems field, which the authors believe has slowed the progress in the field [ 45 ]. Their methodology has gained popularity since publication, with over 10 000 citations (based on Google Scholar citation count). According to [ 45 ], the collection process of relevant research literature should be concept-oriented or author-centric and is not limited to individual journals or geographical regions. They recommend to identify appropriate journal articles and conference proceedings by conducting a keyword-based search in different literature databases. Additional relevant sources should be identified by searching the references of the literature collected in this manner. This technique is called backward search and can be combined with forward search , which locates literature that cites one of the originally identified documents as a literature source. An overview of the searching methodology applied in this paper is shown in Fig. 2 . We used the research literature review of Webster and Watson [ 45 ] to build the knowledge base for a literature-driven comparison analysis of open-source FL frameworks.
Process flow used in this study to identify and filter relevant publications for the literature review
For the literature search, the publication databases ACM Digital Library, EBSCOhost and IEEE Xplore were used to identify relevant publications and literature sources (see Fig. 2 ). As recommended by [ 45 ] we mainly searched for peer-reviewed journal articles and conference proceedings, so that a reliable research is feasible. A logical combination of the following terms served as the search string:
‘federated learning’ AND ‘framework’ AND ‘open-source’ OR ‘federated framework’ AND ‘privacy-preserving machine learning’ AND ‘open-source’.
In some cases, additional search keywords were used, determined by reviewing the tables of contents of the retrieved literature based on the search string [ 45 ]. In addition, the research literature was filtered by publication date from 2016 to 2024 to obtain more recent sources. 2016 was chosen as date filter because that was the first year the term federated learning was officially used in a publication [ 1 ]. The forward and backward searches, as described by Webster and Watson [ 45 ], were used to identify additional relevant sources. This made it possible to identify publications that referenced other relevant publications, most of which were not much older than the origin publications. One reason for this could be that the term federated learning did not exist before 2016, so the range of publication dates is quite narrow. For the forward search, Google Scholar, Science Direct, Semantic Scholar, and ArXiv were used in addition to the literature databases mentioned above.
To further filter the identified publications, the following certain inclusion and exclusion criteria were used, defined as follows:
Inclusion Criteria :
The identified publication deals with the topic of federated learning and contributes answers to at least one of the RQs (see Sect. 1 on page 1).
The title and the abstract seem to contribute to the RQs and contain at least one of the following terms: framework, federated learning, machine learning, evaluation or open-source.
Exclusion Criteria :
The publication is not written in English.
The title and abstract do not appear to contribute to the RQs and do not contain a term from the search string (see Subsect. 3.1 ) or inclusion criteria.
The publication is a patent, master thesis, or a non-relevant web page.
The publication is not electronically accessible without payment (i.e. only print issue).
All relevant aspects of the publication are already included in another publication.
The publication only compares existing research and has no new input.
A publication is included in the pool of relevant literature for reviewing if both inclusion criteria are met, and it is excluded if any of the exclusion criteria is fulfilled. Exceptions that are not subject to these criteria are sources that additionally serve to quantitatively or qualitatively support the comparison, such as GitHub repositories or the websites from the FL frameworks. Such sources are also included in our literature database, having a low relevance score.
We initially checked the titles and abstracts of the publications for the individual key words of the search term (see Subsect. 3.1 on page 4) and added the publications to the literature pool if there were any matches based on the defined inclusion and exclusion criteria (see Subsect. 3.2 on page 5). Thus, 1328 individual publications from the literature databases were obtained. With the introduction and conclusion, 1196 publications have been eliminated due to lack of relevance. As a result, 132 publications, including 60 peer-reviewed journal articles, 27 conference proceedings, 10 white papers and 35 online sources form the basis for the literature-driven comparative analysis. In the refinement process (see step 3 on Fig. 2 on page 4), duplicated sources were removed, since in some cases the same publication was listed in at least or more than two literature databases.
For the literature review a concept-oriented matrix according to Webster and Watson was used, which enables a systematic relevance assessment of the identified literature [ 45 ]. A publication is rated according to the number of concepts covered. Based on the RQs (see Sect. 1 on page 1), the individual concepts or topics for the literature review in this study are defined as follows:
FL General Information (GI)
FL Security Mechanisms (SM)
FL Algorithms (AL)
FL Frameworks (FW)
For each identified source, the title, the type of publication, the name of the publishing journal or conference if applicable, the number of citations, and a brief summary of the relevant content were noted. Afterwards, the literature was scored based on a scale of 1 to 4, with a publication scored 4 representing high relevance and a publication scored 1 representing low relevance. The rating schema is based on the concepts described above and defined as follows:
1 Point: Relevant to one specific concept except for FW.
2 Points: Relevant to at least two concepts or FW.
3 Points: Relevant to at least three concepts or FW and one or two other concepts.
4 Points: Relevant to all four concepts (GI, SM, AL and FW).
Additional sources not directly related to the concepts defined above were included in the concept Misc. and have been automatically assigned a relevance score of 1. An excerpt of the applied concept-oriented tabular literature review according to Webster and Watson [ 45 ] can be found in Table 1 on page 7. In this study, the knowledge base obtained from the literature review forms the basis for the weighted comparison and evaluation of different open-source FL frameworks (see Sect. 5 on page 8).
To analyze the research literature, a Latent Dirichlet Allocation (LDA) was applied on the identified publications to discover common overlapping topics [ 67 ]. This could be used to verify the relevance of our chosen Literature Review Concepts. Stop words, numerical characters and conjunctions have been filtered out in advance. The number of components of the LDA was set to 10. This number was chosen after conducting a grid search and analyzing the generated topics. With the number of components set to 10, a topic that could be assigned to the Literature Review Concept ‘FL Frameworks’ was included for the first time. Thus, this was the lowest number of topics with which all four of the identified Literature Review Concepts were captured by the LDA. In each topic, the LDA determined the 20 most relevant words from the provided literature. Relevance represents the amount of times a word was assigned to a given topic [ 67 ]. Figure 5 (see Appendix, on page 18) displays these identified topics and their most relevant words. The topics were further condensed into the previously defined four concepts in Table 2 . A word cloud consisting of the most common words in the identified literature can be seen in Fig. 6 (see Appendix, on page 19).
The literature-driven analysis reveals that FL frameworks have not often been part of research works on FL (see Table 2 ). This work aims to close this research gap. Figure 3 on page 6 shows the distribution of reviewed FL sources by the publication year. Noticeable is that FL received an overall boost in research interest in 2022 compared to 2021 (25 vs 14 publications). We expect the number of research publications on the four FL concepts described (see Subsect. 3.4 on page 5) to increase in the future as more user-friendly FL frameworks facilitate accessibility to FL to a wider range of users. It is worth to mention that some sources dealing with FL frameworks are GitHub repositories and white papers of the framework developers. In conducting the literature review (see Table 1 on page 7), a total of 18 FL frameworks were identified for the comparison and evaluation. To filter the number of FL frameworks, inclusion criteria are defined and used in this study. These filter criteria and the selected FL frameworks are described in the next section.
Histogram of reviewed literature by year of publication from 2016 (first FL publication) to February 2024 (current research)
Although the term FL was coined as early as in 2016 [ 1 ], it is only in recent years that more Python-based frameworks have emerged that attempt to provide FL in a more user-friendly and application-oriented manner (see Fig. 3 on page 6). Some of the identified FL frameworks are hidden behind paywalls or are completely outdated and no longer actively developed and supported, making it impractical to include them for a fair comparison. Therefore, the following two inclusion criteria must be fulfilled by the FL frameworks in order to be considered as comparison candidates.
Open-Source Availability In this paper, we also want to contribute to the topic of open-source in AI solutions and affirm its importance in the research community. In times when more and more AI applications are offered behind obfuscated paywalls (e.g., OpenAI [ 68 ]), researchers and developers should also consider the numerous advantages when developing innovative AI solutions as open-source products. After all, the rapid development of AI has only been possible due to numerous previous relevant open-source works. Thus, for the comparison study only open-source FL frameworks are chosen.
A few enterprises, such as IBM [ 59 ] or Microsoft [ 69 ], offer both a commercial integration and a open-source version of their FL frameworks for research purposes. For such FL frameworks only the free versions are considered in our comparison analysis.
Commercial FL frameworks such as Sherpa.ai FL [ 47 , 65 ] are not considered in this work as they do not follow the spirit of open-source. Benchmarking frameworks such as LEAF [ 70 ] or FedScale [ 71 ] were also excluded.
Community Popularity Another inclusion criterion used for filtering FL frameworks is the popularity in the community. It can be assumed that FL frameworks with an active and large GitHub community are more actively developed, more likely to be supported in the long term and thus more beneficial for practitioners. Therefore, this criterion excludes smaller or experimental FL frameworks, such as OpenFed [ 72 ].
As a metric for community activity the number of GitHub Stars are used. FL frameworks that have received at least 200 GitHub Stars for their code repositories are considered. The GitHub Stars indicate how many GitHub users bookmarked the repository, which can be interpreted as a reflection of the popularity of a GitHub repository. In fact, only FL frameworks provided by a company or an academic institution are considered in this study.
To provide a first initial overview of the 15 filtered FL frameworks, a comparison of them is shown in Table 3 on page 9 based on the following metrics: the developer country of origin, GitHub stars, the number of Git releases, dates of the initial and lates releases. Notably, PySyft is the most popular FL framework with over 9000 GitHub stars, followed by FATE AI and FedML. In general, FL frameworks which were released earlier have a higher numbers of GitHub stars. PySyft and TFF have been updated the most, while FLUTE has not yet had an official release on GitHub. Apart from Flower, all other FL frameworks were developed either in China or in the USA. 200 was chosen as the critical value, as this produces a manageable number of FL frameworks with the greatest popularity. In addition, a clear break between the much and little observed frameworks can be seen in this value range, as only a few frameworks can be found between 500 and 200, before the number of repositories increases drastically below 200 stars.
This section starts with the introduction of the comparison criteria and the weighted scoring system in Subsec. 5.1 on page 8. Then, the comparison and evaluation of the 15 FL frameworks is performed and the results are presented in 5.2 on page 11. This section closes with a discussion and analysis of our findings in 5.3 on page 14.
To ensure a detailed comparison, the FL frameworks are examined from three different perspectives, namely Features , Interoperability and User Friendliness using a weighted scoring system. All three main comparison categories each make up 100%. For each comparison category, this subsection describes individual comparison criteria and their weighting in descending order of relevance. The comparison criteria in each perspective category were selected based on the systematic literature review described in 3.4 on page 5.
Features This comparison category aims to examine and compare the inherent features of each FL framework. From the user’s point of view, it is mandatory to know the relevant features of an FL framework in order to select a suitable framework for an FL project. Typical FL framework features include the support of different FL Paradigms (horizontal, vertical, and federated transfer learning), Security Mechanisms (cryptographic and algorithm-based methods), different FL Algorithms and specific federated ML Models [ 33 , 34 , 95 , 96 , 97 , 98 , 99 , 100 , 101 ].
In terms of weighting, Security Mechanisms is weighted most heavily at 35%, because increased data privacy and security is the main motivation for using FL in most applications [ 102 ] and the inherent properties of FL do not guarantee complete security [ 34 , 103 , 104 , 105 , 106 ].
FL Algorithms and ML Models are given equal weighting at 25%, as both a wide range of algorithms and models are important to make an FL framework adaptable to different data-driven use cases [ 62 , 66 , 102 ].
The criterion FL Paradigms is weighted at 15%, because horizontal FL is still the most common FL paradigm [ 102 ], making the inclusion of other FL paradigms (i.e. vertical FL [ 107 ], and federated transfer learning [ 108 ]) less pertinent.
Interoperability
Interoperability is a mandatory factor in the evaluation of FL frameworks, particularly in terms of their compatibility with various software and hardware environments. This category includes support for multiple operating systems beyond the universally supported Linux containerization via Docker, CUDA support for leveraging GPUs, and the feasibility of deploying federated applications to physical edge devices [ 66 ].
The criterion Rollout To Edge Devices is weighted at 50%. This comparison criterion is crucial for the practical deployment of FL applications, enabling real-world applications rather than mere simulations confined to a single device [ 62 , 66 ]. Without this, the scope of FL frameworks would be significantly limited to theoretical or constrained environments.
Support for different Operating Systems is assigned a weight of 25%. This inclusivity ensures that a broader range of practitioners can engage with the FL framework, thereby expanding its potential user base and facilitating wider adoption across various platforms [ 62 ].
GPU Support is considered important due to the acceleration it can provide to model training processes, and is weighted at 15%. Although beneficial for computational efficiency, GPU support is not as critical as the other criteria for the core functionality of an FL framework [ 66 ].
Lastly, Docker Installation is recognized as a criterion with a 10% weight. Docker’s containerization technology offers a uniform and isolated environment for FL applications, mitigating setup complexities and compatibility issues across diverse computing infrastructures [ 109 ]. While Docker support enhances versatility and accessibility, it is deemed optional since there are FL frameworks available that may not necessitate containerization for running on other OSes. Although Docker’s containerization is a beneficial attribute for FL frameworks, it is not as heavily weighted as the capacity for edge device deployment or OS support, which are more essential for the practical implementation and broad usability of FL applications.
User Friendliness The aim of this comparison category is to examine and compare the simplicity and user-friendliness of the individual FL frameworks when creating FL applications. The simple use of an FL framework can shorten the development times in an FL project and thus save costs. Therefore, the following comparison criteria should be considered in this criteria group: Development Effort needed to create and run an FL session, federated Model Accuracy on unseen data, available online Documentation , FL Training Speed , Data Preparation Effort , Model Evaluation techniques and, if existing, the Pricing Systems for additional functionalities (e.g., online dashboards and model pipelines) [ 62 , 66 ].
The criteria Development Effort and Model Accuracy are deemed most critical, each carrying a 25% weight, due to their direct impact on the usability of FL frameworks and the effectiveness of the resultant FL applications [ 110 ]. The focus is on quantifying the ease with which developers can leverage the framework to create and deploy FL applications. This facet is critical as it directly influences the time-to-market and development costs of FL projects. Also for the FL application’s success it is important how well a federated model can perform on unseen new data [ 62 , 66 ].
The Documentation aspect is weighted with 20%. Given the novelty of many FL frameworks and the potential scarcity of coding examples, the availability and quality of documentation are evaluated [ 66 ]. This criterion underscores the importance of well-structured and informative documentation that can aid developers in effectively utilizing the FL framework, encompassing tutorials, API documentation, and example projects.
The Training Speed criteria is weighted lower with 10%, since a faster training time is advantageous for any FL framework, but is less relevant compared to a high model accuracy [ 62 , 66 ]. It reflects on the optimization and computational efficiency of the framework in processing FL tasks.
The Data Preparation Effort is assigned a weight of 10%. It evaluates the degree to which an FL framework supports data preprocessing and readiness, considering the ease with which data can be formatted, augmented, and made suitable for federated training. Although not critical for the operational use of an FL framework, streamlined data preparation processes can enhance developer productivity.
Model Evaluation receives the lowest weighting of 5%. It scrutinizes the methodologies and tools available within the FL framework for assessing global model performance and robustness, including validation techniques and metrics. Different model evaluation methods are helpful for practitioners, but not necessary for the effective use of an FL framework [ 66 ]. Thus, this criterion has more a supportive role in the broader context of FL application development.
Since the focus of this work is on open-source FL frameworks, the Pricing Systems is also only weighted at 5%. For FL frameworks that offer additional functionalities through paid versions, this evaluates the cost-benefit ratio of such features. While the core focus is on open-source frameworks, the assessment of pricing systems is still relevant for understanding the scalability and industrial applicability of the framework’s extended features.
To assess the scores for the Development Effort , Model Accuracy , Training Speed , Data Preparation Effort and Model Evaluation criteria, a federated test application has been created, simulating an FL setting while running on a single device. This application used the MNIST dataset [ 111 , 112 ] and performed an image multi-class classification task with a multi-layer perceptron neural network model. A grid search approach was used to identify an optimal hyperparameter configuration. The selected hyperparameters for the model trainings were used identically for testing each FL framework (see Table 4 on page 11).
Weighted Scoring In each of the three comparison categories mentioned above, the criteria are assigned weights that sum up to 100%. Consequently, the total score for all comparison criteria within a category represents the percentage score obtained by an evaluated FL framework in that particular category. These percentage scores for each category are then combined using a weighted sum to derive an overall total score. This serves as a final metric for selecting the best FL framework across all categories. All criterion weights are also listed in Table 7 on page 20 in the Appendix.
The distribution of the weighting of the three top level categories is as follows:
User Friendliness has the highest weighting ( 50% ), as the criteria in this category have the greatest impact for practitioners working with FL frameworks.
Features has the second highest weighting ( 30% ), as this category indicates which functionalities such as Security Mechanisms or FL Paradigms are supported in an FL framework.
Interoperability is weighted as the lowest ( 20% ), as it primarily indicates the installation possibilities of an FL framework, but does not represent core functionalities or the framework’s usability.
The FL frameworks can achieve one of three possible scores in each criterion: a score of zero is awarded if the FL framework does not fulfill the requirements of the criterion at all. A half score is awarded if the FL framework partially meets the requirements. A score of one is awarded if the FL framework fully meets the requirements. If a criterion cannot be verified or tested at all, then it is marked with N.A. (Not Available). This is treated as a score of zero in this criterion when calculating the total score. The detailed scoring schemes for each criterion are given in Table 7 on page 20 in the Appendix.
The scoring Table 5 on page 12 shows the comparison matrix of the 15 FL framework candidates on the basis of the defined categories and criteria from Subsect. 5.1 on page 8. In the following, we explain our assessment of the individual comparison criteria for the FL frameworks. Note: we write the individual comparison criteria in capital letters to highlight them.
Evaluation of Features It can be noted that for the first criterion, Security Mechanisms , five FL frameworks (PySyft, PaddleFL, FLARE, FLSim and FederatedScope) provide both cryptographic and algorithmic security features such as differential privacy, secure aggregation strategies, secure multiparty computation, trusted execution environments and homomorphic encryption [ 6 , 34 , 35 , 53 , 108 , 113 , 114 , 115 , 116 , 117 , 118 , 119 , 120 , 121 , 122 ]. Therefore, these FL frameworks receive the full score for this criterion. On the other hand, FATE AI, FedML, TFF, Flower, FedLearner, OpenFL, IBM FL and FLUTE all provide only one type of security mechanism. Thus, these FL frameworks receive half the score [ 48 , 49 , 50 , 52 , 57 , 63 , 64 , 66 , 78 , 86 , 87 , 123 ]. FedLab and EasyFL provide no security mechanisms and receive a score of zero in this criterion [ 54 , 55 , 92 ].
For the next criterion, FL Algorithms , the FL frameworks: FedML, TFF, Flower, OpenFL, IBM FL, FLARE, FLUTE, FederatedScope and FedLab receive full scores, because they provide out-of-the-box implementations of the FedAvg [ 1 ] algorithm as well as several different adaptive FL algorithms such as FedProx, FedOpt and FedAdam [ 124 , 125 ]. On the other hand, FATE AI, FedLearner, PaddleFL, FLSim and EasyFL only provide FedAvg as an aggregation strategy; other algorithms are not available in these FL frameworks by default, resulting in a halving of the score on this criterion. PySyft is the only FL framework candidate that requires manual implementation of an FL strategy (even for FedAvg). Therefore, PySyft receives a zero score on this criterion as it requires more effort to set up a training process [ 46 , 51 , 52 , 57 , 60 , 62 , 63 , 73 , 81 , 83 , 86 , 87 , 89 , 92 , 93 ].
For building ML Models , PySyft, FATE AI, FedML, Flower, OpenFL, IBM FL, FLARE and FederatedScope support the deep learning libraries Tensorflow and PyTorch. They provide users with a wide range of federatable ML models. Therefore, these FL frameworks are awarded the full marks on this criterion. However, TFF (Tensorflow), FedLearner (Tensorflow), PaddleFL (PaddlePaddle), FLSim (PyTorch), FLUTE (PyTorch), FedLab (PyTorch) and EasyFL (PyTorch) receive half the score because users are limited to only one supported ML library [ 52 , 57 , 60 , 62 , 63 , 64 , 76 , 78 , 81 , 83 , 86 , 87 , 89 , 91 , 93 ].
In terms of FL Paradigms , there are seven FL frameworks that support both horizontal and vertical FL and therefore receive full marks: PySyft, FATE AI, FedML, Flower, FedLearner, PaddleFL and FederatedScope. TFF, OpenFL, IBM FL, FLARE, FLSim, FLUTE, FedLab and EasyFL receive a zero score because they only support the standard horizontal FL paradigm [ 55 , 57 , 66 , 74 , 78 , 81 , 83 , 86 , 87 , 88 , 90 , 92 , 126 ].
Evaluation of Interoperability The Rollout To Edge Devices that allows FL applications to be implemented in real-world environments (e.g., on thin-clients or IoT devices) is possible with PySyft, FedML, Flower, IBM FL and FLARE. Therefore, they receive full marks on this criterion. However, PySyft only supports Raspberry Pi, while the other four FL frameworks also support the Nvidia Jetson Development Kits [ 86 ]. FATE AI, PaddleFL, FederatedScope and EasyFL each receive half the possible score because the rollout process on edge devices is more cumbersome compared to the other FL frameworks. For example, FATE AI and PaddleFL require edge devices with at least 100 GB of storage and 6 GB of RAM, which excludes most single-board computers. The FL frameworks TFF, FedLearner, OpenFL, FLUTE, FLSim and FedLab do not score on this criterion because they only support FL in simulation mode on a single device [ 46 , 52 , 60 , 62 , 63 , 64 , 77 , 83 , 87 , 89 , 91 , 93 ].
For the Operating System support, PySyft, FedML, Flower, IBM FL, FLARE, FLSim, FederatedScope and FedLab receive full marks, as Windows and MacOS are natively supported. On the other hand, the following FL framework candidates support only one of each: TFF (MacOS), OpenFL (MacOS), FLUTE (Windows) and EasyFL (MacOS) receive half the score. FATE AI, FedLearner and PaddleFL run only on Linux and require Docker containers when used on Windows or MacOS. Therefore, these three FL frameworks do not receive any points for this criterion [ 50 , 57 , 60 , 73 , 76 , 78 , 79 , 81 , 83 , 84 , 87 , 88 , 90 , 91 , 93 ].
All compared FL frameworks offer GPU Support and receive full scores on this criterion, except for FLSim. The documentation of FLSim makes no reference to a CUDA acceleration mode during FL training and CUDA could not be enabled during the conducted experiments. Therefore, this FL framework receives a score of zero in this criterion [ 51 , 52 , 63 , 66 , 73 , 74 , 76 , 81 , 83 , 86 , 87 , 90 , 91 , 93 ].
13 of the 15 FL framework candidates have a Docker containerization option and therefore receive full marks. These frameworks provide Docker images, which can be installed using the Docker Engine. By setting up a Docker container, it is possible to create an isolated environment which makes it possible to install software even though its requirements are not supported by the system specifications [ 109 ]. Some frameworks like FLARE and OpenFL provide a Dockerfile which builds the image automatically, while other frameworks like PaddleFL provide a documentation on how to install the Docker image manually. Surprisingly, FLSim and Microsoft’s FLUTE do not seem to support Docker containers. The use of Docker containers was not mentioned in the documentations and was not possible during the experiments conducted. Therefore, these two FL frameworks receive zero points for this criterion [ 57 , 60 , 73 , 74 , 76 , 78 , 79 , 81 , 83 , 84 , 87 , 88 , 90 , 91 , 93 ].
Evaluation of User Friendliness For the FATE AI, PaddleFL, and FedLearner FL frameworks, it is not possible to evaluate the criteria Development Effort , Model Accuracy , Training Speed , Data Preparation effort, and model Evaluation because of a number of issues with these FL frameworks, such as failed installations on Windows, Linux or MacOS. Thus, these FL frameworks are marked as N.A. in the mentioned criteria, because test experiments could not be performed with them.
For Development Effort , TFF, OpenFL, FedLab and EasyFL receive a score of one as the setup of applications with these frameworks was intuitive, fast and required few lines of code. FedML, Flower, IBM FL, FLSim, FLUTE and FederatedScope receive a half score, since development requires more lines of code than with the four frameworks mentioned previously, but aspects of the training process like the federated aggregation step or the local loss step are implemented. PySyft and FLARE require the most development effort because parts of the training process, such as gradient descent, must be implemented and set by the user, which is not the case for the other FL framework candidates. Thus, PySyft and FLARE are rewarded with zero points on Development Effort.
As for the global Model Accuracy , PySyft, Flower, OpenFL, IBM FL, FLARE, FLSim, FedLab and EasyFL achieved a test accuracy of over 90% in the performed MNIST classification simulation. On the other hand, FedML, TFF, FLUTE and FederatedScope performed worse, achieving an accuracy below the 90% threshold, thus receiving only half the score, even though the same model architecture, configuration and parameters have been used (see Table 4 on page 11). The test accuracies for the tested frameworks can be found in Table 6 on page 13.
Surprisingly, the amount and quality of Documentation available for the FL frameworks varies widely. PySyft [ 64 , 73 ], TFF [ 63 , 79 ], Flower [ 51 , 77 , 78 ] FLARE [ 84 , 85 , 86 ] and EasyFL [ 55 , 93 , 94 ] provide extensive API documentation, several sample applications and video tutorials to learn how to use these frameworks. These FL frameworks receive the full score on the criterion Documentation. However, FedLearner [ 60 ], PaddleFL [ 57 ], FLSim [ 87 ], and FLUTE [ 69 , 88 ] provide only little and mostly outdated documentation. Therefore, this group of FL frameworks receive zero points here. For FATE AI [ 56 , 74 ], FedML [ 46 , 75 , 76 ], OpenFL [ 49 , 81 ] IBM FL [ 48 , 59 , 83 ], FederatedScope [ 53 , 89 , 90 ] and FedLab [ 54 , 91 , 92 ], the available documentation is less extensive and at times outdated. These FL frameworks receive a score of 0.5 for this criterion.
When performing the test experiments with the FL framework candidates, there were also differences in the model Training Speed . With TFF, OpenFL, FLSim, FedLab and EasyFL, the federated training was completed in less than a minute, giving these frameworks a full score. FL Frameworks with a training speed between one and three minutes (FedML, Flower, FLARE, FLUTE, FederatedScope) received half of the score, while training on PySyft and IBM FL took longer than three minutes, resulting in a score of zero for these two frameworks. Since FLUTE can only be used on Windows [ 88 ], the training speed measurement may not be directly comparable to the measurements of the other FL frameworks which were computed on another computer running MacOS with a different hardware specification. The exact training speeds for the tested frameworks can be found in Table 6 on page 13.
For the assessment of the Data Preparation effort, we considered the effort required to transform proxy training datasets such as MNIST [ 112 ] into the required data format of the FL frameworks. Here, PySyft, Flower, FLARE, FLUTE and FedLab required only minor adjustments (e.g., reshaping the input data) and therefore received full scores, while TFF and IBM FL required more preparation, so both FL frameworks received no scores. FedML, OpenFL, FLSim, FederatedScope and EasyFL received a score of 0.5.
For the Evaluation criterion, TFF, OpenFL, IBM FL, FLSim, FederatedScope, FedLab and EasyFL provide built-in evaluation methods that display test set loss and accuracy metrics for the federated training of a global model, resulting in a full score for these FL frameworks in the Model Evaluation criterion. Since the main category is User Friendliness, PySyft receives a score of zero here because in PySyft all evaluation metrics must be implemented manually, which may include the requirements of additional libraries (e.g., TensorBoard). FedML, Flower, OpenFL and FLUTE provided evaluation methods with incomplete or convoluted output and thus received a score of 0.5.
For the Pricing System criterion, all FL framework candidates except FLUTE and IBM FL receive full marks because their features are freely accessible. FLUTE is integrated with Azure ML Studio [ 69 ]. Microsoft touts a faster and easier federated development process by leveraging its cloud service and proclaiming FLUTE’s integration with Azure ML as one of its key benefits, as the federated application can be used directly in the Azure ecosystem [ 69 ]. On the other hand, IBM FL is part of IBM Watson Studio cloud service, where additional features such as a UI-based monitoring and configuration are available that cannot be used in the open-source community edtion [ 59 ]. Therefore, FLUTE and IBM FL do not score on this criterion.
Considering the scores at the category level, there are some FL frameworks that received notable scores in certain categories. FederatedScope received the highest score in the Features category with 100%, offering differential privacy and homomorphic encryption as security mechanisms, support for different ML libraries and many FL algorithms like FedAvg, FedOpt and FedProx. Meanwhile, EasyFL received only 25% of the score, offering no security mechanisms, FedAvg as the only implemented FL algorithm and one ML library, while only horizontal FL is available as a paradigm.
The FL frameworks PySyft, FedML, Flower, IBM FL and FLARE earned a perfect score of 100% in the Interoperability category, while FedLearner and FLSim performed joint-worst, receiving 25% of the category score (see Table 5 on page 12). FedLearner does not offer a rollout on edge devices and is not available for installation on either Windows or MacOS, limiting its potential user base. FLSim is available for both Windows and MacOS, but does not support a rollout on edge devices, GPU-based computation, or a Docker containerization.
Remarkably, EasyFL received the highest score of 95% in the User Friendliness category, fullfilling the most important criteria: Development Effort, Model Accuracy Documentation and Training Speed. The FL frameworks for which no test application could be created received the lowest scores, with FedLearner and PaddleFL receiving the lowest score in this category with 5%, and FATE AI receiving 15%. These low scores are noteworthy, since these three FL frameworks all have a long development history and are popular within the community (see Table 3 on page 9).
Based on the conducted comparison and evaluation, a ranking of FL frameworks can be constructed, which is visualized in Fig. 4 on page 15. It can be concluded that in terms of the overall score, Flower performed best with 84.75%, followed by FLARE with 80.5% and FederatedScope with 78.75% (see Table 5 on page 12). PySyft, FedML, OpenFL, EasyFL, IBM FL, TFF and FedLab all received scores at or above 60% overall. FLSim received a score of 54.25% and FLUTE scored 43.25%, while FATE AI, PaddleFL and FedLearner all scored below 40% in total, with FedLearner’s 24.75% marking the lowest score of the frameworks in this comparison.
The graphical representation of the scores on the bar plot further shows that the top ten FL frameworks, although with big differences in the category scores, all achieved relatively high total scores (at or above 60%). This suggests that a number of FL frameworks could already offer a satisfying solution for practitioners. The total score for the final five FL frameworks on the bar plot decreases sharply, indicating significant shortcomings in categories or specific criteria. FLSim and FLUTE scored low in the Interoperability category at 25% and 27.5% respectively, while FATE AI, PaddleFL and FedLearner received low User Friendliness scores (15%, 5%, and 5%).
Total scores (in percentage) of the compared frameworks
Generally, the difference in score between the FL frameworks in the Features category is small compared to the other categories. Only two frameworks score below 50%. Most variance in this category is introduced by the security and paradigm criteria. Should secure computation and communication be the focal point of development, then PySyft, PaddleFL, FLARE, FLSim and FederatedScope would provide the most extensive features for this use case.
In the Interoperability category, it is observable that only five of the FL frameworks (PySyft, FedML, Flower, IBM FL, FLARE) support a rollout on edge devices without strong limitations. This explains the high fluctuation of scores for this category, as the Rollout criterion was weighted heavily. Should the development of a fully realized, distributed FL application be central to a project, these five FL frameworks offer the best conditions and are most suitable for communication and real-time computing with IoT edge devices.
Examining the User Friendliness category, the Development Effort and Documentation criteria explain a lot of variability, while most FL frameworks generally perform well when tested for model test accuracy and federated training speed. An unexpectedly large variance was observed in the Training Speed criterion, with times ranging from under one minute to over three minutes. This may be explained by the different architecture of the FL frameworks and sequential and parallel computing approaches in simulation mode. Overall, the three FL frameworks (FATE AI, FedLearner, PaddleFL) for which no test application could be created are big outliers in this category. These three frameworks consequently also received the lowest total score, as displayed in Fig. 4 on page 15.
Furthermore, there are specific use cases for which some frameworks may be particularly suitable. FLARE is being developed by the same company (NVIDIA) which released Clara, which is an artificial intelligence suite focused on medical use cases. It may therefore be argued that FLARE profits from experiences made during the development of Clara. Meanwhile, FedML provides a website with an FL dashboard, where projects can be tracked and shared with collaborators, allowing for easy deployment, and sharing of applications. This may be advantageous when developing an FL applications across organizations. Furthermore, an extension for FATE called FATE-LLM has been released, targeting development of large language models in a federated setting, giving FATE a strong foundation in this area [ 127 ].
It can be concluded that the evaluated FL frameworks are relatively homogeneous regarding the criteria in the Features category. Support for a rollout on edge devices in the Interoperability category and differences in the availability and quality of documentation in the User Friendliness category are the major reasons for the variance in total score between the FL frameworks. To attract practitioners to their FL frameworks, these two aspects need to be most urgently improved by the underperforming FL frameworks.
Based on the literature-driven comparison and analysis results, the RQs posed at the beginning of this paper (see Subection 1 on page 1) can be answered as follows:
RQ 1: Which relevant frameworks for FL exist and are open-source? 15 relevant FL frameworks were selected, reduced from a total of 18 identified FL frameworks after applying the inclusion criteria defined in SubSect. 4.1 on page 7. Table 3 on page 9 gives an overview of the selected FL frameworks. These filtered frameworks are all available as open-source software and have community and industry support. The FL frameworks are used as objects of study in the FL framework comparative analysis (see Sect. 5 on page 8).
RQ 2: Which criteria enable a qualitative and quantitative comparison of FL frameworks? The criteria, weights and evaluation schema introduced in Sect. 5.1 , summarized in Table 7 on page 20, are used in the comparison in SubSect. 5.2 . The criteria include quantitative measures such as Model Accuracy and Training Speed as well as qualitative measures such as the included Security Mechanisms and the quality and scope of the available Documentation. The evaluation schema based on these criteria creates a versatile and comprehensive comparison of FL frameworks.
RQ 3: Which FL framework offers the most added value to practitioners and researchers? Different FL frameworks received the highest scores in each of the three formulated categories (FederatedScope in Features, PySyft, FedML, Flower, IBM FL and FLARE in Interoperability and EasyFL in User Friendliness). This indicates that one of several FL Frameworks might provide the most added value depending on one’s preferences and needs regarding a particular project. The criteria, their weights and the presented result can in this case act as guidelines for FL framework selection. However, based on the comparative results (see SubSect. 5.2 on page 11), the FL framework Flower currently offers the most overall added value to practitioners and researchers.
In this study, not all currently available FL frameworks are represented, since we formulated inclusion criteria to limit the number of FL framework candidates (see SubSect. 4.1 on page 7). The field of FL frameworks for the proposed comparison suite can be extended to include, for example, proprietary framework candidates that have not been considered in this study. A comparison of these with open-source FL frameworks could provide further interesting insights into the alignment and target audience of each FL framework. Additional experiments with FL frameworks in different FL settings could lead to more comprehensive benchmarking results. The vertical FL and federated transfer learning settings would be possible additions, should more frameworks support these paradigms in the future. Depending on the use case, an adjustment of the criteria weighting might also be required. Therefore, the comparison evaluation schema proposed in this paper can be adapted as desired to reflect the priorities of practitioners and researchers for particular FL projects.
FL is still a niche research field, but the number of scientific papers published each year is steadily increasing (see Fig. 3 on page 6) [ 128 , 129 , 130 , 131 , 132 ]. Based on this trend, we also expect a large number of new FL frameworks to be released in the near future. These emerging FL frameworks can be evaluated and compared to other FL frameworks upon release using the comparison methodology proposed in this paper.
In this study, a comparison suite to evaluate open-source Federated Learning (FL) frameworks was introduced. For this, a literature review was conducted following the guidelines set by Webster and Watson. The review method involved identifying relevant literature and organizing it based on the most significant concepts discovered through the use of a Latent Dirichlet Allocation (LDA) applied on identified publications relevant to FL. Based on filtered relevant literature, comparison criteria were formulated, and a weighted scoring system has been proposed. The criteria were categorized into the overarching categories of Features, Interoperability, and User Friendliness. Additionally, two inclusion criteria, namely the open-source availability and community popularity were established to narrow down the number of FL frameworks under consideration. This enabled us to conduct a more detailed comparison and evaluation of 15 relevant open-source FL frameworks as the study subjects. Both qualitative and quantitative aspects of the FL frameworks were compared, and a detailed score was calculated for each FL framework as a percentage. The conducted comparison analysis demonstrated that among the investigated FL frameworks, Flower performed the best, achieving a total score of 84.75%. Other FL framework candidates such as FLARE, FederatedScope, PySyft, FedML, OpenFL, EasyFL, IBM FL, TFF and FedLab also achieved a high total score (at or above 60%) but could not beat Flower in all aspects. Additionally, we observed that FederatedScope performed best in the Features category. PySyft, FedML, Flower, IBM FL and FLARE all received highest scores in the Interoperability category, while EasyFL performed best in the User Friendliness category. The worst performing FL frameworks were FATE AI, PaddleFL and FedLearner with a total score of 38.5%, 35% and 24.75% respectively, because they lacked in the Interoperability and particularily in the User Friendliness category. Due to their limitations, test experiments could not be conducted to accurately measure criteria such as Model Accuracy or Training Speed. While this study demonstrated the superior performance of FL frameworks such as Flower, FLARE or FederatedScope in most baseline scenarios, it is important to note that the priorities and requirements of practitioners and researchers may vary. Therefore, the results of this study can be used primarily as a guiding tool in the FL framework selection process for federated-driven analyses.
The MNIST [ 111 , 112 ] proxy dataset that supports the findings of this study is openly available in http://yann.lecun.com/exdb/mnist/
McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. J Mach Learn Res 54:1273–1282
Google Scholar
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. 23rd ACM conference on computer and communications security (CCS 2016), 308–318. https://doi.org/10.1145/2976749.2978318
Hard A, Rao K, Mathews R, Beaufays F, Augenstein S, Eichner H, Kiddon C, Ramage D (2018) Federated learning for mobile keyboard prediction arXiv:1811.03604
Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37:50–60. https://doi.org/10.1109/MSP.2020.2975749
Article Google Scholar
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz KA, Charles Z, Cormode G, Cummings R, D’Oliveira RGL, Rouayheb SE, Evans D, Gardner J, Garrett Z, Gascon A, Ghazi B, Gibbons PB, Gruteser M, Harchaoui Z, He C, He L, Huo Z, Hutchinson B, Hsu J, Jaggi M, Javidi T, Joshi G, Khodak M, Konecny J, Korolova A, Koushanfar F, Koyejo S, Lepoint T, Liu Y, Mittal P, Mohri M, Nock R, Ozgur A, Pagh R, Raykova M, Qi H, Ramage D, Raskar R, Song D, Song W, Stich SU, Sun Z, Suresh AT, Tramer F, Vepakomma P, Wang J, Xiong L, Xu Z, Yang Q, Yu FX, Yu H, Zhao S (2021) Advances and open problems in federated learning. Found Trends Mac Learn 14:1–121. https://doi.org/10.1561/2200000083
Zhang L, Zhu T, Xiong P, Zhou W, Yu P (2023) A robust game-theoretical federated learning framework with joint differential privacy. IEEE Trans Knowl Data Eng 35:3333–3346. https://doi.org/10.1109/TKDE.2021.3140131
Jin H, Bai D, Yao D, Dai Y, Gu L, Yu C, Sun L (2023) Personalized edge intelligence via federated self-knowledge distillation. IEEE Trans Parallel Distrib Syst 34:567–580. https://doi.org/10.1109/TPDS.2022.3225185
Nguyen DC, Pham Q-V, Pathirana PN, Ding M, Seneviratne A, Lin Z, Dobre O, Hwang W-J (2022) Federated learning for smart healthcare: a survey. ACM Comput Surv 55:1–37
Antunes RS, da Costa CA, Küderle A, Yari IA, Eskofier B (2022) Federated learning for healthcare: systematic review and architecture proposal. ACM Trans Intell Syst Technol 13:1–23
Xing H, Xiao Z, Qu R, Zhu Z, Zhao B (2022) An efficient federated distillation learning system for multi-task time series classification. IEEE Trans Instrum Meas 71:1–12. https://doi.org/10.1109/TIM.2022.3201203
Riedel P, von Schwerin R, Schaudt D, Hafner A, Resnetfed S (2023) Federated deep learning architecture for privacy-preserving pneumonia detection from covid-19 chest radiographs. J Healthcare Inf Res 7:203–224
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS (2023) Federated learning-based ai approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Clust Comput 26:2271–2311. https://doi.org/10.1007/s10586-022-03658-4
Bharati S, Mondal MRH, Podder P, Prasath VBS (2022) Federated learning: applications, challenges and future directions. Int J Hybrid Intell Syst 18:19–35
Witt L, Heyer M, Toyoda K, Samek W, Li D (2023) Decentral and incentivized federated learning frameworks: a systematic literature review. IEEE Internet Things J 10:3642–3663
Xiao Z, Xu X, Xing H, Song F, Wang X, Zhao B (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2021.107338
Boobalan P, Ramu SP, Pham QV, Dev K, Pandya S, Maddikunta PKR, Gadekallu TR, Huynh-The T (2022) Fusion of federated learning and industrial internet of things: a survey. Comput Netw 212
Pandya S, Srivastava G, Jhaveri R, Babu MR, Bhattacharya S, Maddikunta PKR, Mastorakis S, Thippa MJP, Gadekallu R (2023) Federated learning for smart cities: a comprehensive survey. Sustain Energy Technol Assess 55:2–13
Zhang T, Gao L, He C, Zhang M, Krishnamachari B, Avestimehr AS (2022) Federated learning for the internet of things: applications, challenges, and opportunities. IEEE Internet Things Mag 5:24–29
Zhang K, Song X, Zhang C, Yu S (2021) Challenges and future directions of secure federated learning: a survey. Front Comput Sci 16:1–8
Li C, Zeng X, Zhang M, Cao Z (2022) Pyramidfl: a fine-grained client selection framework for efficient federated learning. Proceedings of the 28th annual international conference on mobile computing and networking 28, 158–171
Huang W, Ye M, Du B (2022) Learn from others and be yourself in heterogeneous federated learning. 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wen J, Zhang Z, Lan Y, Cui Z, Cai J, Zhang W (2023) A survey on federated learning: challenges and applications. Int J Mach Learn Cybern 14:513–535. https://doi.org/10.1007/s13042-022-01647-y
Guendouzi BS, Ouchani S, Assaad HE, Zaher ME (2023) A systematic review of federated learning: challenges, aggregation methods, and development tools. J Netw Comput Appl. https://doi.org/10.1016/j.jnca.2023.103714
Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data arXiv:1806.00582
Almanifi ORA, Chow C-O, Tham M-L, Chuah JH, Kanesan J (2023) Communication and computation efficiency in federated learning: a survey. Internet Things 22:100742
Xu C, Qu Y, Xiang Y, Gao L (2023) Asynchronous federated learning on heterogeneous devices: a survey. Comput Sci Rev. https://doi.org/10.1016/j.cosrev.2023.100595
Qi P, Chiaro D, Guzzo A, Ianni M, Fortino G, Piccialli F (2023) Model aggregation techniques in federated learning: a comprehensive survey. Futur Gener Comput Syst 150:272–293. https://doi.org/10.1016/j.future.2023.09.008
Li Q, Diao Y, Chen Q, He B (2022) Federated learning on non-iid data silos: an experimental study. 2022 IEEE 38th iInternational conference on data engineering (ICDE)
Wang Z, Xu H-Z, Xu Y, Jiang Z, Liu J, Chen S (2024) Fast: enhancing federated learning through adaptive data sampling and local training. IEEE Trans Parallel Distrib Syst 35:221–236. https://doi.org/10.1109/TPDS.2023.3334398
Abreha HG, Hayajneh M, Serhani MA (2022) Federated learning in edge computing: a systematic survey. Sensors 22:450
Ticao Zhang SM (2021) An introduction to the federated learning standard. GetMobile Mobile Comput Commun 25:18–22
Beltrán ETM, Pérez MQ, Sánchez PMS, Bernal SL, Bovet G, Pérez MG, Pérez GM, Celdrán AH (2023) Decentralized federated learning: fundamentals, state of the art, frameworks, trends, and challenges. IEEE Commun Surv Tutorials 25:2983–3013. https://doi.org/10.1109/COMST.2023.3315746
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol 10:1–19. https://doi.org/10.1145/3298981
Gong X, Chen Y, Wang Q, Kong W (2023) Backdoor attacks and defenses in federated learning: state-of-the-art, taxonomy, and future directions. IEEE Wirel Commun 30:114–121. https://doi.org/10.1109/MWC.017.2100714
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9:211–407. https://doi.org/10.1561/0400000042
Article MathSciNet Google Scholar
McMahan HB, Ramage D, Talwar K, Zhang L (2018) Learning differencially private recurrent language models. International Conference on Learning Representations
Shaheen M, Farooq MS, Umer T, Kim B-S (2022) Applications of federated learning; taxonomy, challenges, and research trends. Electronics 11:670
Rodríguez-Barroso N, Jiménez-López D, Luzón MV, Herrera F, Martínez-Cámara E (2023) Survey on federated learning threats: concepts, taxonomy on attacks and defences, experimental study and challenges. Inf Fusion 90:148–173
Cummings R, Gupta V, Kimpara D, Morgenstern JH (2019) On the compatibility of privacy and fairness. Adjunct publication of the 27th conference on user modeling, adaptation and personalization, 309–315 https://doi.org/10.1145/3314183.3323847
Kusner MJ, Loftus JR, Russell C, Silva R (2017) Counterfactual fairness. 31st conference on neural iInformation processing systems 30, 4069–4079
Ding J, Tramel E, Sahu AK, Wu S, Avestimehr S, Zhang T (2022) Federated learning challenges and opportunities: an outlook. ICASSP 2022 - 2022 IEEE iInternational conference on acoustics, speech and signal processing (ICASSP)
Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. Proc Mach Learn Res 81:1–15
Zhang X, Kang Y, Chen K, Fan L, Yang Q (2023) Trading off privacy, utility, and efficiency in federated learning. ACM Trans Intell Syst Technol 14:98–18931. https://doi.org/10.1145/3595185
Khan M, Glavin FG, Nickles M (2023) Federated learning as a privacy solution - an overview. Procedia Comput Sci 217:316–325. https://doi.org/10.1016/j.procs.2022.12.227
Webster J, Watson RT (2002) Analyzing the past to prepare for the future: writing a literature review. MIS Q 26(2),
He C, Li S, So J, Zhang M, Wang H, Wang X, Vepakomma P, Singh A, Qiu H, Shen L, Zhao P, Kang Y, Liu Y, Raskar R, Yang Q, Annavaram M, Avestimehr S (2020) Fedml: a research library and benchmark for federated machine learning arXiv:2007.13518
Barroso NR, Stipcich G, Jimenez-Lopez D, Ruiz-Millan JA, Martinez-Camara E, Gonzalez-Seco G, Luzon MV, Veganzones MA, Herrera F (2020) Federated learning and differential privacy: software tools analysis, the sherpa.ai fl framework and methodological guidelines for preserving data privacy. Inf Fusion 64:270–292
Ludwig H, Baracaldo N, Thomas G, Zhou Y, Anwar A, Rajamoni S, Ong YJ, Radhakrishnan J, Verma A, Sinn M, Purcell M, Rawat A, Minh TN, Holohan N, Chakraborty S, Witherspoon S, Steuer D, Wynter L, Hassan H, Laguna S, Yurochkin M, Agarwal M, Chuba E, Abay A (2020) Ibm federated learning: an enterprise framework white paper v0.1 arXiv:2007.10987
Reina GA, Gruzdev A, Foley P, Perepelkina O, Sharma M, Davidyuk I, Trushkin I, Radionov M, Mokrov A, Agapov D, Martin J, Edwards B, Sheller MJ, Pati S, Moorthy PN, Wang HS, Shah P, Bakas S (2021) Openfl: an open-source framework for federated learning arXiv:2105.06413
Liu Y, Fan T, Qian Xu TC, Yang Q (2021) Fate: an industrial grade platform for collaborative learning with data protection. J Mach Learn Res 22:1–6
MathSciNet Google Scholar
Beutel DJ, Topal T, Mathur A, Qiu X, Parcollet T, Lane ND (2020) Flower: a friendly federated learning research framework arXiv:2007.14390
Dimitriadis D, Garcia MH, Diaz DM, Manoel A, Sim R (2022) Flute: a scalable, extensible framework for high-performance federated learning simulations arXiv:2203.13789
Xie Y, Wang Z, Gao D, Chen D, Yao L, Kuang W, Li Y, Ding B, Zhou J (2023) Federatedscope: a flexible federated learning platform for heterogeneity. Proc VLDB Endowment 16: 1000–1012. https://doi.org/10.14778/3579075.3579076
Zeng D, Liang S, Hu X, Wang H, Xu Z (2023) Fedlab: a flexible federated learning framework. J Mach Learn Res 24:1–7
Zhuang W, Gan X, Wen Y, Zhang S (2022) Easyfl: a low-code federated learning platform for dummies. IEEE Internet Things J 9:13740–13754. https://doi.org/10.1109/JIOT.2022.3143842
FedAI: what is FATE? https://fate.fedai.org/overview/ Accessed 20 Feb 2024
PaddlePaddle: GitHub Repository PaddlePaddle/PaddleFL. https://github.com/PaddlePaddle/PaddleFL Accessed 20 Feb 2024
NVIDIA: NVIDIA Clara: an application framework optimized for healthcare and life sciences developers. https://developer.nvidia.com/clara Accessed 30 May 2023
IBM Research: IBM Federated Learning. https://ibmfl.res.ibm.com Accessed 20 Feb 2024
ByteDance: GitHub Repository FedLearner. https://github.com/bytedance/fedlearner Accessed 20 Feb 2024
Liu J, Huang J, Zhou Y, Li X, Ji S, Xiong H, Dou D (2022) From distributed machine learning to federated learning: a survey. Knowl Inf Syst 64:885–917
Kholod I, Yanaki E, Fomichev D, Shalugin ED, Novikova E, Filippov E, Nordlund M (2021) Open-source federated learning frameworks for iot: a comparative review and analysis. Sensors 21:167–189. https://doi.org/10.3390/s21010167
TensorFlow: TensorFlow Federated: Machine Learning on Decentralized Data. https://www.tensorflow.org/federated Accessed 20 Feb 2024
OpenMined: OpenMined. https://www.openmined.org Accessed 20 Feb 2024
Sherpa.ai: Sherpa.ai: Privacy-Preserving Artificial Intelligence. https://www.sherpa.ai Accessed 20 Feb 2024
Liu X, Shi T, Xie C, Li Q, Hu K, Kim H, Xu X, Li B, Song D (2022) Unifed: a benchmark for federated learning frameworks arXiv:2207.10308
SciKitLearn: Latent Dirichlet Allocation. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html Accessed 24 April 2023
OpenAI: OpenAI: Pricing. https://openai.com/pricing Accessed 20 Feb 2024
Microsoft: FLUTE: a scalable federated learning simulation platform. https://bit.ly/3KnvugJ Accessed 20 Feb 2024
Caldas S, Duddu SMK, Wu P, Li T, Konečný J, McMahan HB, Smith V, Talwalkar A (2018) Leaf: a benchmark for federated settings
Lai F, Dai Y, Singapuram S, Liu J, Zhu X, Madhyastha H, Chowdhury M Fedscale: Benchmarking model and system performance of federated learning at scale. Proceedings of the 39th international conference on machine learning 162 (2022)
FederalLab: GitHub Repository OpenFed. https://github.com/FederalLab/OpenFed Accessed 20 Feb 2024
OpenMined: GitHub Repository OpenMined/PySyft. https://github.com/OpenMined Accessed 20 Feb 2024
FedAI: GitHub Repository FedAI/FATE. https://github.com/FederatedAI/FATE Accessed 20 Feb 2024
FedML: FedML: The Federated Learning/Analytics and Edge AI Platform. https://fedml.ai Accessed 20 Feb 2024
FedML: GitHub Repository FedML-AI. https://github.com/FedML-AI Accessed 20 Feb 2024
Adap: Adap: Fleet AI. https://adap.com/en Accessed 20 Feb 2024
Adap: GitHub Repository Adap/Flower. https://github.com/adap/flower Accessed 20 Feb 2024
TensorFlow: GitHub Repository TensorFlow/Federated. https://github.com/tensorflow/federated Accessed 20 Feb 2024
Baidu research: Baidu PaddlePaddle releases 21 new capabilities to accelerate industry-grade model development. http://research.baidu.com/Blog/index-view?id=126 Accessed 07 Aug 2023
Intel: GitHub Repository Intel/OpenFL. https://github.com/intel/openfl Accessed 20 Feb 2024
University of Pennsylvania: CBICA: The Federated Tumor Segmentation (FeTS) Initiative. https://www.med.upenn.edu/cbica/fets/ Accessed 24 Aug 2022
IBM: GitHub Repository IBM Federated Learning. https://github.com/IBM/federated-learning-lib Accessed 20 Feb 2024
NVIDIA: GitHub Repository NVIDIA FLARE. https://github.com/NVIDIA/NVFlare Accessed 20 Feb 2024
Dogra, P.: Federated learning with FLARE: NVIDIA brings collaborative AI to healthcare and beyond. https://blogs.nvidia.com/blog/2021/11/29/federated-learning-ai-nvidia-flare/ Accessed 02 Aug 2023
NVIDIA: NVIDIA FLARE Documentation. https://nvflare.readthedocs.io/en/2.1.1/index.html Accessed 20 Feb 2024
Meta Research: GitHub Repository FLSim. https://github.com/facebookresearch/FLSim Accessed 20 Feb 2024
Microsoft: GitHub Repository Microsoft FLUTE. https://github.com/microsoft/msrflute Accessed 20 Feb 2024
FederatedScope: FederatedScope. https://federatedscope.io Accessed 20 Feb 2024
FederatedScope: GitHub FederatedScope. https://github.com/alibaba/FederatedScope Accessed 20 Feb 2024
FedLab: GitHub FedLab. https://github.com/SMILELab-FL/FedLab Accessed 20 Feb 2024
FedLab: ReadTheDocs FedLab. https://fedlab.readthedocs.io/en/master/ Accessed 20 Feb 2024
EasyFL: GitHub EasyFL. https://github.com/EasyFL-AI/EasyFL/tree/master Accessed 20 Feb 2024
EasyFL: ReadTheDocs EasyFL. https://easyfl.readthedocs.io/en/latest/ Accessed 20 Feb 2024
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Koneny J, Mazzocchi S, McMahan B, Overveldt TV, Petrou D, Ramage D, Roselander J (2019) Towards federated learning at scale: system design. Proc Mach Learn Syst 1:374–388
Mansour Y, Mohri M, Ro J, Suresh AT (2020) Three approaches for personalization with applications to federated learning arXiv:2002.10619
Silva PR, Vinagre J, Gama J (2023) Towards federated learning: an overview of methods and applications. WIREs Data Min Knowl Discov 13:1–23
Zhu H, Xu J, Liu S, Jin Y (2021) Federated learning on non-iid data: a survey. Neurocomputing 465:371–390. https://doi.org/10.1016/j.neucom.2021.07.098
Nilsson A, Smith S, Ulm G, Gustavsson E, Jirstrand M (2018) A performance evaluation of federated learning algorithms. DIDL ’18: Proceedings of the second workshop on distributed infrastructures for deep learning 2, 1–8 . https://doi.org/10.1145/3286490.3286559
Asad M, Moustafa A, Ito T, Aslam M (2020) Evaluating the communication efficiency in federated learning algorithms. Proceedings of the 27th ACM symposium on operating systems principles. https://doi.org/10.1109/CSCWD49262.2021.9437738
Smith V, Chiang C-K, Sanjabi M, Talwalkar A (2017) Federated multi-task learning. 31st conference on neural information processing systems (NIPS 2017), 4427–4437
Lo SK, Lu Q, Wang C, Paik H, Zhu L (2021) A systematic literature review on federated machine learning: from a software engineering perspective. ACM Comput Surv 54(5):1–39. https://doi.org/10.1145/3450288
Lyu L, Yu H, Zhao J, Yang Q (2020) Threats to federated learning. Lecture Notes Artif Intell 12500:3–16. https://doi.org/10.1007/978-3-030-63076-8_1
Bagdasaryan E, Veit A, Hua Y, Estrin D, Shmatikov V (2020) How to backdoor federated learning. Proceedings of the 23rd international conference on artificial intelligence and statistics, 2938–2948
Shejwalkar V, Houmansadr A, Kairouz P, Ramage D (2022) Back to the drawing board: a critical evaluation of poisoning attacks on production federated learning. 2022 IEEE symposium on security and privacy (SP)
Fu J, Zhang X, Ji S, Chen J, Wu J, Guo S, Zhou J, Liu AX, Wang T (2022) Label inference attacks against vertical federated learning. Proceedings of the 31st USENIX security symposium 31
Feng S, Yu H (2020) Multi-participant multi-class vertical federated learning arXiv:2001.11154
Liu Y, Kang Y, Xing C, Chen T, Yang Q (2020) A secure federated transfer learning framework. IEEE Intell Syst 35(4):70–82. https://doi.org/10.1109/MIS.2020.2988525
Docker Inc.: The industry-leading container runtime. https://www.docker.com/products/container-runtime/ Accessed 07 June 2023
Fayad M, Schmidt D (1997) Object-oriented application frameworks. Commun ACM 40(10):32–38. https://doi.org/10.1145/262793.262798
Ge D-Y, Yao X-F, Xiang W-J, Wen, X-J, Liu, E-C (2019) Design of high accuracy detector for mnist handwritten digit recognition based on convolutional neural network. 2019 12th international conference on intelligent computation technology and automation (ICICTA), 658–662 . https://doi.org/10.1109/ICICTA49267.2019.00145
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signals Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Avent B, Korolova A, Zeber D, Hovden T, Livshits B (2017) Blender enabling local search with a hybrid differential privacy model. J Privacy Confid 9, 747–764. DOIurlhttps://doi.org/10.29012/jpc.680
Cheu A, Smith AD, Ullman J, Zeber D, Zhilyaev M (2019) Distributed differential privacy via shuffling. IACR Cryptol. ePrint Arch, 375–403 . https://doi.org/10.1007/978-3-030-17653-2_13
Roth E, Noble D, Falk BH, Haeberlen A (2019) Honeycrisp: large-scale differentially private aggregation without a trusted core. Proceedings of the 27th ACM Symposium on Operating Systems Principles, 196–210. https://doi.org/10.1145/3341301.3359660
Song S, Chaudhuri K, Sarwate AD (2013) Stochastic gradient descent with differentially private updates. 2013 IEEE global conference on signal and information processing, 245–248. https://doi.org/10.1109/GlobalSIP.2013.6736861
Masters O, Hunt H, Steffinlongo E, Crawford J, Bergamaschi F (2019) Towards a homomorphic machine learning big data pipeline for the financial services sector. IACR Cryptol. ePrint Arch, 1–21
Yao AC-C (1986) How to generate and exchange secrets. Proceedings of the 27th annual symposium on foundations of computer science, 162–167
Kaissis G, Ziller A, Passerat-Palmbach J, Ryffel T, Usynin D, Trask A, Lima I, Mancuso J, Jungmann F, Steinborn M-M, Saleh A, Makowski M, Rueckert D, Braren R (2021) End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat Mach Intell 3(6):473–484. https://doi.org/10.1038/s42256-021-00337-8
Subramanyan P, Sinha R, Lebedev IA, Devadas S, Seshia SA (2017) A formal foundation for secure remote execution of enclaves. Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2435–2450. https://doi.org/10.1145/3133956.3134098
Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, Thorne B (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption arXiv:1711.10677
Nikolaenko V, Weinsberg U, Ioannidis S, Joye M, Boneh D, Taft N (2013) Privacy-preserving ridge regression on hundreds of millions of records. 2013 IEEE symposium on security and privacy, 334–348. https://doi.org/10.1109/SP.2013.30
So J, He C, Yang C-S, Li S, Yu Q, Ali RE, Guler B, Avestimehr S (2022) Lightsecagg: a lightweight and versatile design for secure aggregation in federated learning. Proc Mach Learn Syst 4:694–720
Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450
Reddi SJ, Charles Z, Zaheer M, Garrett Z, Rush K, Konečný J, Kumar S, McMahan HB (2021) Adaptive federated optimization. International conference on learning representations ICLR 2021
Romanini D, Hall AJ, Papadopoulos P, Titcombe T, Ismail A, Cebere T, Sandmann R, Roehm R, Hoeh MA (2021) Pyvertical: a vertical federated learning framework for multi-headed splitnn. ICLR 2021 Workshop on distributed and private machine learning
Fan T, Kang Y, Ma G, Chen W, Wei W, Fan L, Yang Q (2023) Fate-llm: a industrial grade federated learning framework for large language models. Arxiv Preprint
Velez-Esteveza A, Ducangeb P, Perezc IJ, Coboc MJ (2022) Conceptual structure of federated learning research field. Procedia Comput Sci 214:1374–1381
Farooq A, Feizollah A, Rehman MH (2021) Federated learning research trends and bibliometric analysis. Stud Comput Intell 965:1–19. https://doi.org/10.1007/978-3-030-70604-3_1
Gong M, Zhang Y, Gao Y, Qin AK, Wu Y, Wang S, Zhang Y (2024) A multi-modal vertical federated learning framework based on homomorphic encryption. IEEE Trans Inf Forensics Secur 19:1826–1839. https://doi.org/10.1109/TIFS.2023.3340994
Caramalau R, Bhattarai B, Stoyanov D (2023) Federated active learning for target domain generalisation. ArXiv abs/2312.02247 . https://doi.org/10.48550/arXiv.2312.02247
Matsuda K, Sasaki Y, Xiao C, Onizuka M (2024) Benchmark for personalized federated learning. IEEE Open J Comput Soc 5:2–13. https://doi.org/10.1109/OJCS.2023.3332351
Download references
Open Access funding enabled and organized by Projekt DEAL.
Authors and affiliations.
University of Applied Sciences Ulm, Prittwitzstraße 10, 89075, Ulm, Baden-Württemberg, Germany
Pascal Riedel, Reinhold von Schwerin, Daniel Schaudt & Alexander Hafner
University of Tübingen, Geschwister-Scholl-Platz, 72074, Tübingen, Baden-Württemberg, Germany
Lukas Schick
University of Ulm, Helmholzstraße 16, 89081, Ulm, Baden-Württemberg, Germany
Pascal Riedel & Manfred Reichert
You can also search for this author in PubMed Google Scholar
Correspondence to Pascal Riedel .
Conflict of interest.
The authors have no Conflict of interest to declare that are relevant to the content of this article and there are no financial interests
The data and models used are purely for scientific purposes and do not replace a clinical COVID-19 diagnosis by medical specialists
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
See Figs. 5 , 6 and Table 7 .
List of topics, words and frequencies using LDA
Graphical representation of the most common words used in the identified literature
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Riedel, P., Schick, L., von Schwerin, R. et al. Comparative analysis of open-source federated learning frameworks - a literature-based survey and review. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02234-z
Download citation
Received : 13 August 2023
Accepted : 28 May 2024
Published : 28 June 2024
DOI : https://doi.org/10.1007/s13042-024-02234-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Authors : Ana Lúcia da Cunha; Alamir Costa Louro
Addresses : Federal Institute of Maranhão, Carolina Campus, Technical Education Division – Brazil, Rua Imperatriz, 602-736, Carolina-MA, 65980-000, Brazil ' Federal University of Espírito Santo, PPGADM – Brazil, Rua Bias Fortes, 68, Faria Lemos-MG, 36840-000, Brazil
Abstract : Religious tourism involves religion, culture, and the economy, making it a complex subject. This study focuses on service quality. It analyses the current state of service quality in global religious tourism using a Scopus and Web of Science database review. Key findings include trends in Muslim religion-related tourism, concepts like experience, satisfaction, and loyalty in religious tourism service quality, and the strong link between trip attraction satisfaction and repeat intention. The paper also discusses quantitative analysis methodology and future research directions.
Keywords : religious tourism; pilgrimage; service quality; bibliometric; foundations; trends.
DOI : 10.1504/IJTP.2024.139738
International Journal of Tourism Policy, 2024 Vol.14 No.4, pp.402 - 420
Received: 12 Jul 2023 Accepted: 10 Nov 2023 Published online: 05 Jul 2024 *
IMAGES
VIDEO
COMMENTS
As mentioned previously, there are a number of existing guidelines for literature reviews. Depending on the methodology needed to achieve the purpose of the review, all types can be helpful and appropriate to reach a specific goal (for examples, please see Table 1).These approaches can be qualitative, quantitative, or have a mixed design depending on the phase of the review.
This article is organized as follows: The next section presents the methodology adopted by this research, followed by a section that discusses the typology of literature reviews and provides empirical examples; the subsequent section summarizes the process of literature review; and the last section concludes the paper with suggestions on how to improve the quality and rigor of literature ...
Mixed studies review/mixed methods review: Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context, it refers to a combination of review approaches, for example, combining quantitative with qualitative research or outcome with process studies.
1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...
9.3. Types of Review Articles and Brief Illustrations. EHealth researchers have at their disposal a number of approaches and methods for making sense out of existing literature, all with the purpose of casting current research findings into historical contexts or explaining contradictions that might exist among a set of primary research studies conducted on a particular topic.
Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure .An SLR updates the reader with current literature about a subject .The goal is to review critical points of current knowledge on a topic about research ...
The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information.
quantitative literature review? 1. Systematic = methods to survey literature and select papers to include are explicit and reproducible 2. Quantitative = measure of the amount (number of papers) of research within different sections of topic 3. Comprehensive = assesses different combinations of
Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.
The conventional focus of rigorous literature reviews (i.e., review types for which systematic methods have been codified, including the various approaches to quantitative systematic reviews [2-4], and the numerous forms of qualitative and mixed methods literature synthesis [5-10]) is to synthesize empirical research findings from multiple ...
In the field of research, the term method represents the specific approaches and procedures that the researcher systematically utilizes that are manifested in the research design, sampling design, data collec-tion, data analysis, data interpretation, and so forth. The literature review represents a method because the literature reviewer chooses ...
In The Literature Review: A Step-by-Step Guide for Students, Ridley presents that literature reviews serve several purposes (2008, p. 16-17). Included are the following points: Historical background for the research; Overview of current field provided by "contemporary debates, issues, and questions;" Theories and concepts related to your research;
A systematic review aims to bring evidence together to answer a pre-defined research question. This involves the identification of all primary research relevant to the defined review question, the critical appraisal of this research, and the synthesis of the findings.13 Systematic reviews may combine data from different.
Comprehensive Literature Reviews: Involve supplementing electronic searches with a review of references in identified literature, manual searches of references and journals, and consulting experts for both unpublished and published studies and reports. Reporting Standards: Checking for Research Writing and Reviewing.
This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.
Mixed studies review/mixed methods review: Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies
Systematic review vs. literature review. A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method. ... Quantitative: Use statistical methods to summarize and ...
Quantitative Research (an operational definition) Quantitative research: an operational description. Purpose: explain, predict or control phenomena through focused collection and analysis of numberical data. Approach: deductive; tries to be value-free/has objectives/ is outcome-oriented. Hypotheses: Specific, testable, and stated prior to study.
This short video introduces viewers to a powerful 15 step method for undertaking and publishing literature reviews including by those new to the discipline. It is the first in a series of four videos on the Systematic Quantitative Literature Review providing an overview of the method in outlined in: Pickering, C.M. and Byrne, J. (2014).
Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...
The systematic quantitative literature review (SQLR) is a method for undertaking literature reviews, based on the key principles of rigour, comprehensiveness, and repeatability. The method has 15 steps that begin with ideation and conceptualisation, proceeding to searching databases, coding papers, analysing patterns and trends, and then ...
Systematic = methods to survey literature and select papers to include are explicit and reproducible. 2. Quantitative = measure of the amount (number of papers) of research within different sections of topic. 3. Comprehensive = assesses different combinations of locations, subjects, variables and responses. 4.
Literature reviews are comprehensive summaries and syntheses of the previous research on a given topic. While narrative reviews are common across all academic disciplines, reviews that focus on appraising and synthesizing research evidence are increasingly important in the health and social sciences.. Most evidence synthesis methods use formal and explicit methods to identify, select and ...
Design/methodology/approach. Using a systematic literature review (SLR) method, pertinent journal articles published over the past 3 decades were retrieved and analyzed. Based on the review process, 44 papers were identified and analyzed by publication year, journal distribution, research method, and lead author.
Chapter 2 Quantitative review methods. A systematic review and meta-analysis was conducted based upon a protocol published on the PROSPERO ... and Medical Literature Analysis and Retrieval System Online (MEDLINE) (January 2012 to July 2013). The breadth of the literature identified meant we took a pragmatic approach and limited our search ...
Virtual patients (VPs) are widely used in health professions education. When they are well integrated into curricula, they are considered to be more effective than loosely coupled add-ons. However, it is unclear what constitutes their successful integration. The aim of this study was to identify and synthesise the themes found in the literature that stakeholders perceive as important for ...
It evaluates literature quality, explores research theories, and identifies trends. Findings indicate that most museum visitors hold positive attitudes towards VR, necessitating further research on generational perspectives. Quantitative research methods dominate (77.8%), with 38.9% employing the Technology Acceptance Model (TAM) or its extensions.
The current study followed an explanatory sequential mixed-methods design (Ivankova et al., 2006) in which the quantitative data was analyzed first and then followed by the analysis of qualitative data. The purpose of the study was to investigate the effect of electronic portfolio-based writing portfolios on the writing performance and self ...
Leveraging the structured literature review methodology by Webster and Watson, this comparative study identifies the most pertinent quantitative and qualitative criteria for FL framework users, ensuring a selection of comparison criteria that is both comprehensive and methodically sound, surpassing the scope of similar studies.
Key findings include trends in Muslim religion-related tourism, concepts like experience, satisfaction, and loyalty in religious tourism service quality, and the strong link between trip attraction satisfaction and repeat intention. The paper also discusses quantitative analysis methodology and future research directions.