• USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Types of Research Designs
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Introduction

Before beginning your paper, you need to decide how you plan to design the study .

The research design refers to the overall strategy and analytical approach that you have chosen in order to integrate, in a coherent and logical way, the different components of the study, thus ensuring that the research problem will be thoroughly investigated. It constitutes the blueprint for the collection, measurement, and interpretation of information and data. Note that the research problem determines the type of design you choose, not the other way around!

De Vaus, D. A. Research Design in Social Research . London: SAGE, 2001; Trochim, William M.K. Research Methods Knowledge Base. 2006.

General Structure and Writing Style

The function of a research design is to ensure that the evidence obtained enables you to effectively address the research problem logically and as unambiguously as possible . In social sciences research, obtaining information relevant to the research problem generally entails specifying the type of evidence needed to test the underlying assumptions of a theory, to evaluate a program, or to accurately describe and assess meaning related to an observable phenomenon.

With this in mind, a common mistake made by researchers is that they begin their investigations before they have thought critically about what information is required to address the research problem. Without attending to these design issues beforehand, the overall research problem will not be adequately addressed and any conclusions drawn will run the risk of being weak and unconvincing. As a consequence, the overall validity of the study will be undermined.

The length and complexity of describing the research design in your paper can vary considerably, but any well-developed description will achieve the following :

  • Identify the research problem clearly and justify its selection, particularly in relation to any valid alternative designs that could have been used,
  • Review and synthesize previously published literature associated with the research problem,
  • Clearly and explicitly specify hypotheses [i.e., research questions] central to the problem,
  • Effectively describe the information and/or data which will be necessary for an adequate testing of the hypotheses and explain how such information and/or data will be obtained, and
  • Describe the methods of analysis to be applied to the data in determining whether or not the hypotheses are true or false.

The research design is usually incorporated into the introduction of your paper . You can obtain an overall sense of what to do by reviewing studies that have utilized the same research design [e.g., using a case study approach]. This can help you develop an outline to follow for your own paper.

NOTE : Use the SAGE Research Methods Online and Cases and the SAGE Research Methods Videos databases to search for scholarly resources on how to apply specific research designs and methods . The Research Methods Online database contains links to more than 175,000 pages of SAGE publisher's book, journal, and reference content on quantitative, qualitative, and mixed research methodologies. Also included is a collection of case studies of social research projects that can be used to help you better understand abstract or complex methodological concepts. The Research Methods Videos database contains hours of tutorials, interviews, video case studies, and mini-documentaries covering the entire research process.

Creswell, John W. and J. David Creswell. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches . 5th edition. Thousand Oaks, CA: Sage, 2018; De Vaus, D. A. Research Design in Social Research . London: SAGE, 2001; Gorard, Stephen. Research Design: Creating Robust Approaches for the Social Sciences . Thousand Oaks, CA: Sage, 2013; Leedy, Paul D. and Jeanne Ellis Ormrod. Practical Research: Planning and Design . Tenth edition. Boston, MA: Pearson, 2013; Vogt, W. Paul, Dianna C. Gardner, and Lynne M. Haeffele. When to Use What Research Design . New York: Guilford, 2012.

Action Research Design

Definition and Purpose

The essentials of action research design follow a characteristic cycle whereby initially an exploratory stance is adopted, where an understanding of a problem is developed and plans are made for some form of interventionary strategy. Then the intervention is carried out [the "action" in action research] during which time, pertinent observations are collected in various forms. The new interventional strategies are carried out, and this cyclic process repeats, continuing until a sufficient understanding of [or a valid implementation solution for] the problem is achieved. The protocol is iterative or cyclical in nature and is intended to foster deeper understanding of a given situation, starting with conceptualizing and particularizing the problem and moving through several interventions and evaluations.

What do these studies tell you ?

  • This is a collaborative and adaptive research design that lends itself to use in work or community situations.
  • Design focuses on pragmatic and solution-driven research outcomes rather than testing theories.
  • When practitioners use action research, it has the potential to increase the amount they learn consciously from their experience; the action research cycle can be regarded as a learning cycle.
  • Action research studies often have direct and obvious relevance to improving practice and advocating for change.
  • There are no hidden controls or preemption of direction by the researcher.

What these studies don't tell you ?

  • It is harder to do than conducting conventional research because the researcher takes on responsibilities of advocating for change as well as for researching the topic.
  • Action research is much harder to write up because it is less likely that you can use a standard format to report your findings effectively [i.e., data is often in the form of stories or observation].
  • Personal over-involvement of the researcher may bias research results.
  • The cyclic nature of action research to achieve its twin outcomes of action [e.g. change] and research [e.g. understanding] is time-consuming and complex to conduct.
  • Advocating for change usually requires buy-in from study participants.

Coghlan, David and Mary Brydon-Miller. The Sage Encyclopedia of Action Research . Thousand Oaks, CA:  Sage, 2014; Efron, Sara Efrat and Ruth Ravid. Action Research in Education: A Practical Guide . New York: Guilford, 2013; Gall, Meredith. Educational Research: An Introduction . Chapter 18, Action Research. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007; Gorard, Stephen. Research Design: Creating Robust Approaches for the Social Sciences . Thousand Oaks, CA: Sage, 2013; Kemmis, Stephen and Robin McTaggart. “Participatory Action Research.” In Handbook of Qualitative Research . Norman Denzin and Yvonna S. Lincoln, eds. 2nd ed. (Thousand Oaks, CA: SAGE, 2000), pp. 567-605; McNiff, Jean. Writing and Doing Action Research . London: Sage, 2014; Reason, Peter and Hilary Bradbury. Handbook of Action Research: Participative Inquiry and Practice . Thousand Oaks, CA: SAGE, 2001.

Case Study Design

A case study is an in-depth study of a particular research problem rather than a sweeping statistical survey or comprehensive comparative inquiry. It is often used to narrow down a very broad field of research into one or a few easily researchable examples. The case study research design is also useful for testing whether a specific theory and model actually applies to phenomena in the real world. It is a useful design when not much is known about an issue or phenomenon.

  • Approach excels at bringing us to an understanding of a complex issue through detailed contextual analysis of a limited number of events or conditions and their relationships.
  • A researcher using a case study design can apply a variety of methodologies and rely on a variety of sources to investigate a research problem.
  • Design can extend experience or add strength to what is already known through previous research.
  • Social scientists, in particular, make wide use of this research design to examine contemporary real-life situations and provide the basis for the application of concepts and theories and the extension of methodologies.
  • The design can provide detailed descriptions of specific and rare cases.
  • A single or small number of cases offers little basis for establishing reliability or to generalize the findings to a wider population of people, places, or things.
  • Intense exposure to the study of a case may bias a researcher's interpretation of the findings.
  • Design does not facilitate assessment of cause and effect relationships.
  • Vital information may be missing, making the case hard to interpret.
  • The case may not be representative or typical of the larger problem being investigated.
  • If the criteria for selecting a case is because it represents a very unusual or unique phenomenon or problem for study, then your interpretation of the findings can only apply to that particular case.

Case Studies. Writing@CSU. Colorado State University; Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 4, Flexible Methods: Case Study Design. 2nd ed. New York: Columbia University Press, 1999; Gerring, John. “What Is a Case Study and What Is It Good for?” American Political Science Review 98 (May 2004): 341-354; Greenhalgh, Trisha, editor. Case Study Evaluation: Past, Present and Future Challenges . Bingley, UK: Emerald Group Publishing, 2015; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Stake, Robert E. The Art of Case Study Research . Thousand Oaks, CA: SAGE, 1995; Yin, Robert K. Case Study Research: Design and Theory . Applied Social Research Methods Series, no. 5. 3rd ed. Thousand Oaks, CA: SAGE, 2003.

Causal Design

Causality studies may be thought of as understanding a phenomenon in terms of conditional statements in the form, “If X, then Y.” This type of research is used to measure what impact a specific change will have on existing norms and assumptions. Most social scientists seek causal explanations that reflect tests of hypotheses. Causal effect (nomothetic perspective) occurs when variation in one phenomenon, an independent variable, leads to or results, on average, in variation in another phenomenon, the dependent variable.

Conditions necessary for determining causality:

  • Empirical association -- a valid conclusion is based on finding an association between the independent variable and the dependent variable.
  • Appropriate time order -- to conclude that causation was involved, one must see that cases were exposed to variation in the independent variable before variation in the dependent variable.
  • Nonspuriousness -- a relationship between two variables that is not due to variation in a third variable.
  • Causality research designs assist researchers in understanding why the world works the way it does through the process of proving a causal link between variables and by the process of eliminating other possibilities.
  • Replication is possible.
  • There is greater confidence the study has internal validity due to the systematic subject selection and equity of groups being compared.
  • Not all relationships are causal! The possibility always exists that, by sheer coincidence, two unrelated events appear to be related [e.g., Punxatawney Phil could accurately predict the duration of Winter for five consecutive years but, the fact remains, he's just a big, furry rodent].
  • Conclusions about causal relationships are difficult to determine due to a variety of extraneous and confounding variables that exist in a social environment. This means causality can only be inferred, never proven.
  • If two variables are correlated, the cause must come before the effect. However, even though two variables might be causally related, it can sometimes be difficult to determine which variable comes first and, therefore, to establish which variable is the actual cause and which is the  actual effect.

Beach, Derek and Rasmus Brun Pedersen. Causal Case Study Methods: Foundations and Guidelines for Comparing, Matching, and Tracing . Ann Arbor, MI: University of Michigan Press, 2016; Bachman, Ronet. The Practice of Research in Criminology and Criminal Justice . Chapter 5, Causation and Research Designs. 3rd ed. Thousand Oaks, CA: Pine Forge Press, 2007; Brewer, Ernest W. and Jennifer Kubn. “Causal-Comparative Design.” In Encyclopedia of Research Design . Neil J. Salkind, editor. (Thousand Oaks, CA: Sage, 2010), pp. 125-132; Causal Research Design: Experimentation. Anonymous SlideShare Presentation; Gall, Meredith. Educational Research: An Introduction . Chapter 11, Nonexperimental Research: Correlational Designs. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007; Trochim, William M.K. Research Methods Knowledge Base. 2006.

Cohort Design

Often used in the medical sciences, but also found in the applied social sciences, a cohort study generally refers to a study conducted over a period of time involving members of a population which the subject or representative member comes from, and who are united by some commonality or similarity. Using a quantitative framework, a cohort study makes note of statistical occurrence within a specialized subgroup, united by same or similar characteristics that are relevant to the research problem being investigated, rather than studying statistical occurrence within the general population. Using a qualitative framework, cohort studies generally gather data using methods of observation. Cohorts can be either "open" or "closed."

  • Open Cohort Studies [dynamic populations, such as the population of Los Angeles] involve a population that is defined just by the state of being a part of the study in question (and being monitored for the outcome). Date of entry and exit from the study is individually defined, therefore, the size of the study population is not constant. In open cohort studies, researchers can only calculate rate based data, such as, incidence rates and variants thereof.
  • Closed Cohort Studies [static populations, such as patients entered into a clinical trial] involve participants who enter into the study at one defining point in time and where it is presumed that no new participants can enter the cohort. Given this, the number of study participants remains constant (or can only decrease).
  • The use of cohorts is often mandatory because a randomized control study may be unethical. For example, you cannot deliberately expose people to asbestos, you can only study its effects on those who have already been exposed. Research that measures risk factors often relies upon cohort designs.
  • Because cohort studies measure potential causes before the outcome has occurred, they can demonstrate that these “causes” preceded the outcome, thereby avoiding the debate as to which is the cause and which is the effect.
  • Cohort analysis is highly flexible and can provide insight into effects over time and related to a variety of different types of changes [e.g., social, cultural, political, economic, etc.].
  • Either original data or secondary data can be used in this design.
  • In cases where a comparative analysis of two cohorts is made [e.g., studying the effects of one group exposed to asbestos and one that has not], a researcher cannot control for all other factors that might differ between the two groups. These factors are known as confounding variables.
  • Cohort studies can end up taking a long time to complete if the researcher must wait for the conditions of interest to develop within the group. This also increases the chance that key variables change during the course of the study, potentially impacting the validity of the findings.
  • Due to the lack of randominization in the cohort design, its external validity is lower than that of study designs where the researcher randomly assigns participants.

Healy P, Devane D. “Methodological Considerations in Cohort Study Designs.” Nurse Researcher 18 (2011): 32-36; Glenn, Norval D, editor. Cohort Analysis . 2nd edition. Thousand Oaks, CA: Sage, 2005; Levin, Kate Ann. Study Design IV: Cohort Studies. Evidence-Based Dentistry 7 (2003): 51–52; Payne, Geoff. “Cohort Study.” In The SAGE Dictionary of Social Research Methods . Victor Jupp, editor. (Thousand Oaks, CA: Sage, 2006), pp. 31-33; Study Design 101. Himmelfarb Health Sciences Library. George Washington University, November 2011; Cohort Study. Wikipedia.

Cross-Sectional Design

Cross-sectional research designs have three distinctive features: no time dimension; a reliance on existing differences rather than change following intervention; and, groups are selected based on existing differences rather than random allocation. The cross-sectional design can only measure differences between or from among a variety of people, subjects, or phenomena rather than a process of change. As such, researchers using this design can only employ a relatively passive approach to making causal inferences based on findings.

  • Cross-sectional studies provide a clear 'snapshot' of the outcome and the characteristics associated with it, at a specific point in time.
  • Unlike an experimental design, where there is an active intervention by the researcher to produce and measure change or to create differences, cross-sectional designs focus on studying and drawing inferences from existing differences between people, subjects, or phenomena.
  • Entails collecting data at and concerning one point in time. While longitudinal studies involve taking multiple measures over an extended period of time, cross-sectional research is focused on finding relationships between variables at one moment in time.
  • Groups identified for study are purposely selected based upon existing differences in the sample rather than seeking random sampling.
  • Cross-section studies are capable of using data from a large number of subjects and, unlike observational studies, is not geographically bound.
  • Can estimate prevalence of an outcome of interest because the sample is usually taken from the whole population.
  • Because cross-sectional designs generally use survey techniques to gather data, they are relatively inexpensive and take up little time to conduct.
  • Finding people, subjects, or phenomena to study that are very similar except in one specific variable can be difficult.
  • Results are static and time bound and, therefore, give no indication of a sequence of events or reveal historical or temporal contexts.
  • Studies cannot be utilized to establish cause and effect relationships.
  • This design only provides a snapshot of analysis so there is always the possibility that a study could have differing results if another time-frame had been chosen.
  • There is no follow up to the findings.

Bethlehem, Jelke. "7: Cross-sectional Research." In Research Methodology in the Social, Behavioural and Life Sciences . Herman J Adèr and Gideon J Mellenbergh, editors. (London, England: Sage, 1999), pp. 110-43; Bourque, Linda B. “Cross-Sectional Design.” In  The SAGE Encyclopedia of Social Science Research Methods . Michael S. Lewis-Beck, Alan Bryman, and Tim Futing Liao. (Thousand Oaks, CA: 2004), pp. 230-231; Hall, John. “Cross-Sectional Survey Design.” In Encyclopedia of Survey Research Methods . Paul J. Lavrakas, ed. (Thousand Oaks, CA: Sage, 2008), pp. 173-174; Helen Barratt, Maria Kirwan. Cross-Sectional Studies: Design Application, Strengths and Weaknesses of Cross-Sectional Studies. Healthknowledge, 2009. Cross-Sectional Study. Wikipedia.

Descriptive Design

Descriptive research designs help provide answers to the questions of who, what, when, where, and how associated with a particular research problem; a descriptive study cannot conclusively ascertain answers to why. Descriptive research is used to obtain information concerning the current status of the phenomena and to describe "what exists" with respect to variables or conditions in a situation.

  • The subject is being observed in a completely natural and unchanged natural environment. True experiments, whilst giving analyzable data, often adversely influence the normal behavior of the subject [a.k.a., the Heisenberg effect whereby measurements of certain systems cannot be made without affecting the systems].
  • Descriptive research is often used as a pre-cursor to more quantitative research designs with the general overview giving some valuable pointers as to what variables are worth testing quantitatively.
  • If the limitations are understood, they can be a useful tool in developing a more focused study.
  • Descriptive studies can yield rich data that lead to important recommendations in practice.
  • Appoach collects a large amount of data for detailed analysis.
  • The results from a descriptive research cannot be used to discover a definitive answer or to disprove a hypothesis.
  • Because descriptive designs often utilize observational methods [as opposed to quantitative methods], the results cannot be replicated.
  • The descriptive function of research is heavily dependent on instrumentation for measurement and observation.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 5, Flexible Methods: Descriptive Research. 2nd ed. New York: Columbia University Press, 1999; Given, Lisa M. "Descriptive Research." In Encyclopedia of Measurement and Statistics . Neil J. Salkind and Kristin Rasmussen, editors. (Thousand Oaks, CA: Sage, 2007), pp. 251-254; McNabb, Connie. Descriptive Research Methodologies. Powerpoint Presentation; Shuttleworth, Martyn. Descriptive Research Design, September 26, 2008; Erickson, G. Scott. "Descriptive Research Design." In New Methods of Market Research and Analysis . (Northampton, MA: Edward Elgar Publishing, 2017), pp. 51-77; Sahin, Sagufta, and Jayanta Mete. "A Brief Study on Descriptive Research: Its Nature and Application in Social Science." International Journal of Research and Analysis in Humanities 1 (2021): 11; K. Swatzell and P. Jennings. “Descriptive Research: The Nuts and Bolts.” Journal of the American Academy of Physician Assistants 20 (2007), pp. 55-56; Kane, E. Doing Your Own Research: Basic Descriptive Research in the Social Sciences and Humanities . London: Marion Boyars, 1985.

Experimental Design

A blueprint of the procedure that enables the researcher to maintain control over all factors that may affect the result of an experiment. In doing this, the researcher attempts to determine or predict what may occur. Experimental research is often used where there is time priority in a causal relationship (cause precedes effect), there is consistency in a causal relationship (a cause will always lead to the same effect), and the magnitude of the correlation is great. The classic experimental design specifies an experimental group and a control group. The independent variable is administered to the experimental group and not to the control group, and both groups are measured on the same dependent variable. Subsequent experimental designs have used more groups and more measurements over longer periods. True experiments must have control, randomization, and manipulation.

  • Experimental research allows the researcher to control the situation. In so doing, it allows researchers to answer the question, “What causes something to occur?”
  • Permits the researcher to identify cause and effect relationships between variables and to distinguish placebo effects from treatment effects.
  • Experimental research designs support the ability to limit alternative explanations and to infer direct causal relationships in the study.
  • Approach provides the highest level of evidence for single studies.
  • The design is artificial, and results may not generalize well to the real world.
  • The artificial settings of experiments may alter the behaviors or responses of participants.
  • Experimental designs can be costly if special equipment or facilities are needed.
  • Some research problems cannot be studied using an experiment because of ethical or technical reasons.
  • Difficult to apply ethnographic and other qualitative methods to experimentally designed studies.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 7, Flexible Methods: Experimental Research. 2nd ed. New York: Columbia University Press, 1999; Chapter 2: Research Design, Experimental Designs. School of Psychology, University of New England, 2000; Chow, Siu L. "Experimental Design." In Encyclopedia of Research Design . Neil J. Salkind, editor. (Thousand Oaks, CA: Sage, 2010), pp. 448-453; "Experimental Design." In Social Research Methods . Nicholas Walliman, editor. (London, England: Sage, 2006), pp, 101-110; Experimental Research. Research Methods by Dummies. Department of Psychology. California State University, Fresno, 2006; Kirk, Roger E. Experimental Design: Procedures for the Behavioral Sciences . 4th edition. Thousand Oaks, CA: Sage, 2013; Trochim, William M.K. Experimental Design. Research Methods Knowledge Base. 2006; Rasool, Shafqat. Experimental Research. Slideshare presentation.

Exploratory Design

An exploratory design is conducted about a research problem when there are few or no earlier studies to refer to or rely upon to predict an outcome . The focus is on gaining insights and familiarity for later investigation or undertaken when research problems are in a preliminary stage of investigation. Exploratory designs are often used to establish an understanding of how best to proceed in studying an issue or what methodology would effectively apply to gathering information about the issue.

The goals of exploratory research are intended to produce the following possible insights:

  • Familiarity with basic details, settings, and concerns.
  • Well grounded picture of the situation being developed.
  • Generation of new ideas and assumptions.
  • Development of tentative theories or hypotheses.
  • Determination about whether a study is feasible in the future.
  • Issues get refined for more systematic investigation and formulation of new research questions.
  • Direction for future research and techniques get developed.
  • Design is a useful approach for gaining background information on a particular topic.
  • Exploratory research is flexible and can address research questions of all types (what, why, how).
  • Provides an opportunity to define new terms and clarify existing concepts.
  • Exploratory research is often used to generate formal hypotheses and develop more precise research problems.
  • In the policy arena or applied to practice, exploratory studies help establish research priorities and where resources should be allocated.
  • Exploratory research generally utilizes small sample sizes and, thus, findings are typically not generalizable to the population at large.
  • The exploratory nature of the research inhibits an ability to make definitive conclusions about the findings. They provide insight but not definitive conclusions.
  • The research process underpinning exploratory studies is flexible but often unstructured, leading to only tentative results that have limited value to decision-makers.
  • Design lacks rigorous standards applied to methods of data gathering and analysis because one of the areas for exploration could be to determine what method or methodologies could best fit the research problem.

Cuthill, Michael. “Exploratory Research: Citizen Participation, Local Government, and Sustainable Development in Australia.” Sustainable Development 10 (2002): 79-89; Streb, Christoph K. "Exploratory Case Study." In Encyclopedia of Case Study Research . Albert J. Mills, Gabrielle Durepos and Eiden Wiebe, editors. (Thousand Oaks, CA: Sage, 2010), pp. 372-374; Taylor, P. J., G. Catalano, and D.R.F. Walker. “Exploratory Analysis of the World City Network.” Urban Studies 39 (December 2002): 2377-2394; Exploratory Research. Wikipedia.

Field Research Design

Sometimes referred to as ethnography or participant observation, designs around field research encompass a variety of interpretative procedures [e.g., observation and interviews] rooted in qualitative approaches to studying people individually or in groups while inhabiting their natural environment as opposed to using survey instruments or other forms of impersonal methods of data gathering. Information acquired from observational research takes the form of “ field notes ” that involves documenting what the researcher actually sees and hears while in the field. Findings do not consist of conclusive statements derived from numbers and statistics because field research involves analysis of words and observations of behavior. Conclusions, therefore, are developed from an interpretation of findings that reveal overriding themes, concepts, and ideas. More information can be found HERE .

  • Field research is often necessary to fill gaps in understanding the research problem applied to local conditions or to specific groups of people that cannot be ascertained from existing data.
  • The research helps contextualize already known information about a research problem, thereby facilitating ways to assess the origins, scope, and scale of a problem and to gage the causes, consequences, and means to resolve an issue based on deliberate interaction with people in their natural inhabited spaces.
  • Enables the researcher to corroborate or confirm data by gathering additional information that supports or refutes findings reported in prior studies of the topic.
  • Because the researcher in embedded in the field, they are better able to make observations or ask questions that reflect the specific cultural context of the setting being investigated.
  • Observing the local reality offers the opportunity to gain new perspectives or obtain unique data that challenges existing theoretical propositions or long-standing assumptions found in the literature.

What these studies don't tell you

  • A field research study requires extensive time and resources to carry out the multiple steps involved with preparing for the gathering of information, including for example, examining background information about the study site, obtaining permission to access the study site, and building trust and rapport with subjects.
  • Requires a commitment to staying engaged in the field to ensure that you can adequately document events and behaviors as they unfold.
  • The unpredictable nature of fieldwork means that researchers can never fully control the process of data gathering. They must maintain a flexible approach to studying the setting because events and circumstances can change quickly or unexpectedly.
  • Findings can be difficult to interpret and verify without access to documents and other source materials that help to enhance the credibility of information obtained from the field  [i.e., the act of triangulating the data].
  • Linking the research problem to the selection of study participants inhabiting their natural environment is critical. However, this specificity limits the ability to generalize findings to different situations or in other contexts or to infer courses of action applied to other settings or groups of people.
  • The reporting of findings must take into account how the researcher themselves may have inadvertently affected respondents and their behaviors.

Historical Design

The purpose of a historical research design is to collect, verify, and synthesize evidence from the past to establish facts that defend or refute a hypothesis. It uses secondary sources and a variety of primary documentary evidence, such as, diaries, official records, reports, archives, and non-textual information [maps, pictures, audio and visual recordings]. The limitation is that the sources must be both authentic and valid.

  • The historical research design is unobtrusive; the act of research does not affect the results of the study.
  • The historical approach is well suited for trend analysis.
  • Historical records can add important contextual background required to more fully understand and interpret a research problem.
  • There is often no possibility of researcher-subject interaction that could affect the findings.
  • Historical sources can be used over and over to study different research problems or to replicate a previous study.
  • The ability to fulfill the aims of your research are directly related to the amount and quality of documentation available to understand the research problem.
  • Since historical research relies on data from the past, there is no way to manipulate it to control for contemporary contexts.
  • Interpreting historical sources can be very time consuming.
  • The sources of historical materials must be archived consistently to ensure access. This may especially challenging for digital or online-only sources.
  • Original authors bring their own perspectives and biases to the interpretation of past events and these biases are more difficult to ascertain in historical resources.
  • Due to the lack of control over external variables, historical research is very weak with regard to the demands of internal validity.
  • It is rare that the entirety of historical documentation needed to fully address a research problem is available for interpretation, therefore, gaps need to be acknowledged.

Howell, Martha C. and Walter Prevenier. From Reliable Sources: An Introduction to Historical Methods . Ithaca, NY: Cornell University Press, 2001; Lundy, Karen Saucier. "Historical Research." In The Sage Encyclopedia of Qualitative Research Methods . Lisa M. Given, editor. (Thousand Oaks, CA: Sage, 2008), pp. 396-400; Marius, Richard. and Melvin E. Page. A Short Guide to Writing about History . 9th edition. Boston, MA: Pearson, 2015; Savitt, Ronald. “Historical Research in Marketing.” Journal of Marketing 44 (Autumn, 1980): 52-58;  Gall, Meredith. Educational Research: An Introduction . Chapter 16, Historical Research. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007.

Longitudinal Design

A longitudinal study follows the same sample over time and makes repeated observations. For example, with longitudinal surveys, the same group of people is interviewed at regular intervals, enabling researchers to track changes over time and to relate them to variables that might explain why the changes occur. Longitudinal research designs describe patterns of change and help establish the direction and magnitude of causal relationships. Measurements are taken on each variable over two or more distinct time periods. This allows the researcher to measure change in variables over time. It is a type of observational study sometimes referred to as a panel study.

  • Longitudinal data facilitate the analysis of the duration of a particular phenomenon.
  • Enables survey researchers to get close to the kinds of causal explanations usually attainable only with experiments.
  • The design permits the measurement of differences or change in a variable from one period to another [i.e., the description of patterns of change over time].
  • Longitudinal studies facilitate the prediction of future outcomes based upon earlier factors.
  • The data collection method may change over time.
  • Maintaining the integrity of the original sample can be difficult over an extended period of time.
  • It can be difficult to show more than one variable at a time.
  • This design often needs qualitative research data to explain fluctuations in the results.
  • A longitudinal research design assumes present trends will continue unchanged.
  • It can take a long period of time to gather results.
  • There is a need to have a large sample size and accurate sampling to reach representativness.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 6, Flexible Methods: Relational and Longitudinal Research. 2nd ed. New York: Columbia University Press, 1999; Forgues, Bernard, and Isabelle Vandangeon-Derumez. "Longitudinal Analyses." In Doing Management Research . Raymond-Alain Thiétart and Samantha Wauchope, editors. (London, England: Sage, 2001), pp. 332-351; Kalaian, Sema A. and Rafa M. Kasim. "Longitudinal Studies." In Encyclopedia of Survey Research Methods . Paul J. Lavrakas, ed. (Thousand Oaks, CA: Sage, 2008), pp. 440-441; Menard, Scott, editor. Longitudinal Research . Thousand Oaks, CA: Sage, 2002; Ployhart, Robert E. and Robert J. Vandenberg. "Longitudinal Research: The Theory, Design, and Analysis of Change.” Journal of Management 36 (January 2010): 94-120; Longitudinal Study. Wikipedia.

Meta-Analysis Design

Meta-analysis is an analytical methodology designed to systematically evaluate and summarize the results from a number of individual studies, thereby, increasing the overall sample size and the ability of the researcher to study effects of interest. The purpose is to not simply summarize existing knowledge, but to develop a new understanding of a research problem using synoptic reasoning. The main objectives of meta-analysis include analyzing differences in the results among studies and increasing the precision by which effects are estimated. A well-designed meta-analysis depends upon strict adherence to the criteria used for selecting studies and the availability of information in each study to properly analyze their findings. Lack of information can severely limit the type of analyzes and conclusions that can be reached. In addition, the more dissimilarity there is in the results among individual studies [heterogeneity], the more difficult it is to justify interpretations that govern a valid synopsis of results. A meta-analysis needs to fulfill the following requirements to ensure the validity of your findings:

  • Clearly defined description of objectives, including precise definitions of the variables and outcomes that are being evaluated;
  • A well-reasoned and well-documented justification for identification and selection of the studies;
  • Assessment and explicit acknowledgment of any researcher bias in the identification and selection of those studies;
  • Description and evaluation of the degree of heterogeneity among the sample size of studies reviewed; and,
  • Justification of the techniques used to evaluate the studies.
  • Can be an effective strategy for determining gaps in the literature.
  • Provides a means of reviewing research published about a particular topic over an extended period of time and from a variety of sources.
  • Is useful in clarifying what policy or programmatic actions can be justified on the basis of analyzing research results from multiple studies.
  • Provides a method for overcoming small sample sizes in individual studies that previously may have had little relationship to each other.
  • Can be used to generate new hypotheses or highlight research problems for future studies.
  • Small violations in defining the criteria used for content analysis can lead to difficult to interpret and/or meaningless findings.
  • A large sample size can yield reliable, but not necessarily valid, results.
  • A lack of uniformity regarding, for example, the type of literature reviewed, how methods are applied, and how findings are measured within the sample of studies you are analyzing, can make the process of synthesis difficult to perform.
  • Depending on the sample size, the process of reviewing and synthesizing multiple studies can be very time consuming.

Beck, Lewis W. "The Synoptic Method." The Journal of Philosophy 36 (1939): 337-345; Cooper, Harris, Larry V. Hedges, and Jeffrey C. Valentine, eds. The Handbook of Research Synthesis and Meta-Analysis . 2nd edition. New York: Russell Sage Foundation, 2009; Guzzo, Richard A., Susan E. Jackson and Raymond A. Katzell. “Meta-Analysis Analysis.” In Research in Organizational Behavior , Volume 9. (Greenwich, CT: JAI Press, 1987), pp 407-442; Lipsey, Mark W. and David B. Wilson. Practical Meta-Analysis . Thousand Oaks, CA: Sage Publications, 2001; Study Design 101. Meta-Analysis. The Himmelfarb Health Sciences Library, George Washington University; Timulak, Ladislav. “Qualitative Meta-Analysis.” In The SAGE Handbook of Qualitative Data Analysis . Uwe Flick, editor. (Los Angeles, CA: Sage, 2013), pp. 481-495; Walker, Esteban, Adrian V. Hernandez, and Micheal W. Kattan. "Meta-Analysis: It's Strengths and Limitations." Cleveland Clinic Journal of Medicine 75 (June 2008): 431-439.

Mixed-Method Design

  • Narrative and non-textual information can add meaning to numeric data, while numeric data can add precision to narrative and non-textual information.
  • Can utilize existing data while at the same time generating and testing a grounded theory approach to describe and explain the phenomenon under study.
  • A broader, more complex research problem can be investigated because the researcher is not constrained by using only one method.
  • The strengths of one method can be used to overcome the inherent weaknesses of another method.
  • Can provide stronger, more robust evidence to support a conclusion or set of recommendations.
  • May generate new knowledge new insights or uncover hidden insights, patterns, or relationships that a single methodological approach might not reveal.
  • Produces more complete knowledge and understanding of the research problem that can be used to increase the generalizability of findings applied to theory or practice.
  • A researcher must be proficient in understanding how to apply multiple methods to investigating a research problem as well as be proficient in optimizing how to design a study that coherently melds them together.
  • Can increase the likelihood of conflicting results or ambiguous findings that inhibit drawing a valid conclusion or setting forth a recommended course of action [e.g., sample interview responses do not support existing statistical data].
  • Because the research design can be very complex, reporting the findings requires a well-organized narrative, clear writing style, and precise word choice.
  • Design invites collaboration among experts. However, merging different investigative approaches and writing styles requires more attention to the overall research process than studies conducted using only one methodological paradigm.
  • Concurrent merging of quantitative and qualitative research requires greater attention to having adequate sample sizes, using comparable samples, and applying a consistent unit of analysis. For sequential designs where one phase of qualitative research builds on the quantitative phase or vice versa, decisions about what results from the first phase to use in the next phase, the choice of samples and estimating reasonable sample sizes for both phases, and the interpretation of results from both phases can be difficult.
  • Due to multiple forms of data being collected and analyzed, this design requires extensive time and resources to carry out the multiple steps involved in data gathering and interpretation.

Burch, Patricia and Carolyn J. Heinrich. Mixed Methods for Policy Research and Program Evaluation . Thousand Oaks, CA: Sage, 2016; Creswell, John w. et al. Best Practices for Mixed Methods Research in the Health Sciences . Bethesda, MD: Office of Behavioral and Social Sciences Research, National Institutes of Health, 2010Creswell, John W. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches . 4th edition. Thousand Oaks, CA: Sage Publications, 2014; Domínguez, Silvia, editor. Mixed Methods Social Networks Research . Cambridge, UK: Cambridge University Press, 2014; Hesse-Biber, Sharlene Nagy. Mixed Methods Research: Merging Theory with Practice . New York: Guilford Press, 2010; Niglas, Katrin. “How the Novice Researcher Can Make Sense of Mixed Methods Designs.” International Journal of Multiple Research Approaches 3 (2009): 34-46; Onwuegbuzie, Anthony J. and Nancy L. Leech. “Linking Research Questions to Mixed Methods Data Analysis Procedures.” The Qualitative Report 11 (September 2006): 474-498; Tashakorri, Abbas and John W. Creswell. “The New Era of Mixed Methods.” Journal of Mixed Methods Research 1 (January 2007): 3-7; Zhanga, Wanqing. “Mixed Methods Application in Health Intervention Research: A Multiple Case Study.” International Journal of Multiple Research Approaches 8 (2014): 24-35 .

Observational Design

This type of research design draws a conclusion by comparing subjects against a control group, in cases where the researcher has no control over the experiment. There are two general types of observational designs. In direct observations, people know that you are watching them. Unobtrusive measures involve any method for studying behavior where individuals do not know they are being observed. An observational study allows a useful insight into a phenomenon and avoids the ethical and practical difficulties of setting up a large and cumbersome research project.

  • Observational studies are usually flexible and do not necessarily need to be structured around a hypothesis about what you expect to observe [data is emergent rather than pre-existing].
  • The researcher is able to collect in-depth information about a particular behavior.
  • Can reveal interrelationships among multifaceted dimensions of group interactions.
  • You can generalize your results to real life situations.
  • Observational research is useful for discovering what variables may be important before applying other methods like experiments.
  • Observation research designs account for the complexity of group behaviors.
  • Reliability of data is low because seeing behaviors occur over and over again may be a time consuming task and are difficult to replicate.
  • In observational research, findings may only reflect a unique sample population and, thus, cannot be generalized to other groups.
  • There can be problems with bias as the researcher may only "see what they want to see."
  • There is no possibility to determine "cause and effect" relationships since nothing is manipulated.
  • Sources or subjects may not all be equally credible.
  • Any group that is knowingly studied is altered to some degree by the presence of the researcher, therefore, potentially skewing any data collected.

Atkinson, Paul and Martyn Hammersley. “Ethnography and Participant Observation.” In Handbook of Qualitative Research . Norman K. Denzin and Yvonna S. Lincoln, eds. (Thousand Oaks, CA: Sage, 1994), pp. 248-261; Observational Research. Research Methods by Dummies. Department of Psychology. California State University, Fresno, 2006; Patton Michael Quinn. Qualitiative Research and Evaluation Methods . Chapter 6, Fieldwork Strategies and Observational Methods. 3rd ed. Thousand Oaks, CA: Sage, 2002; Payne, Geoff and Judy Payne. "Observation." In Key Concepts in Social Research . The SAGE Key Concepts series. (London, England: Sage, 2004), pp. 158-162; Rosenbaum, Paul R. Design of Observational Studies . New York: Springer, 2010;Williams, J. Patrick. "Nonparticipant Observation." In The Sage Encyclopedia of Qualitative Research Methods . Lisa M. Given, editor.(Thousand Oaks, CA: Sage, 2008), pp. 562-563.

Philosophical Design

Understood more as an broad approach to examining a research problem than a methodological design, philosophical analysis and argumentation is intended to challenge deeply embedded, often intractable, assumptions underpinning an area of study. This approach uses the tools of argumentation derived from philosophical traditions, concepts, models, and theories to critically explore and challenge, for example, the relevance of logic and evidence in academic debates, to analyze arguments about fundamental issues, or to discuss the root of existing discourse about a research problem. These overarching tools of analysis can be framed in three ways:

  • Ontology -- the study that describes the nature of reality; for example, what is real and what is not, what is fundamental and what is derivative?
  • Epistemology -- the study that explores the nature of knowledge; for example, by what means does knowledge and understanding depend upon and how can we be certain of what we know?
  • Axiology -- the study of values; for example, what values does an individual or group hold and why? How are values related to interest, desire, will, experience, and means-to-end? And, what is the difference between a matter of fact and a matter of value?
  • Can provide a basis for applying ethical decision-making to practice.
  • Functions as a means of gaining greater self-understanding and self-knowledge about the purposes of research.
  • Brings clarity to general guiding practices and principles of an individual or group.
  • Philosophy informs methodology.
  • Refine concepts and theories that are invoked in relatively unreflective modes of thought and discourse.
  • Beyond methodology, philosophy also informs critical thinking about epistemology and the structure of reality (metaphysics).
  • Offers clarity and definition to the practical and theoretical uses of terms, concepts, and ideas.
  • Limited application to specific research problems [answering the "So What?" question in social science research].
  • Analysis can be abstract, argumentative, and limited in its practical application to real-life issues.
  • While a philosophical analysis may render problematic that which was once simple or taken-for-granted, the writing can be dense and subject to unnecessary jargon, overstatement, and/or excessive quotation and documentation.
  • There are limitations in the use of metaphor as a vehicle of philosophical analysis.
  • There can be analytical difficulties in moving from philosophy to advocacy and between abstract thought and application to the phenomenal world.

Burton, Dawn. "Part I, Philosophy of the Social Sciences." In Research Training for Social Scientists . (London, England: Sage, 2000), pp. 1-5; Chapter 4, Research Methodology and Design. Unisa Institutional Repository (UnisaIR), University of South Africa; Jarvie, Ian C., and Jesús Zamora-Bonilla, editors. The SAGE Handbook of the Philosophy of Social Sciences . London: Sage, 2011; Labaree, Robert V. and Ross Scimeca. “The Philosophical Problem of Truth in Librarianship.” The Library Quarterly 78 (January 2008): 43-70; Maykut, Pamela S. Beginning Qualitative Research: A Philosophic and Practical Guide . Washington, DC: Falmer Press, 1994; McLaughlin, Hugh. "The Philosophy of Social Research." In Understanding Social Work Research . 2nd edition. (London: SAGE Publications Ltd., 2012), pp. 24-47; Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, CSLI, Stanford University, 2013.

Sequential Design

  • The researcher has a limitless option when it comes to sample size and the sampling schedule.
  • Due to the repetitive nature of this research design, minor changes and adjustments can be done during the initial parts of the study to correct and hone the research method.
  • This is a useful design for exploratory studies.
  • There is very little effort on the part of the researcher when performing this technique. It is generally not expensive, time consuming, or workforce intensive.
  • Because the study is conducted serially, the results of one sample are known before the next sample is taken and analyzed. This provides opportunities for continuous improvement of sampling and methods of analysis.
  • The sampling method is not representative of the entire population. The only possibility of approaching representativeness is when the researcher chooses to use a very large sample size significant enough to represent a significant portion of the entire population. In this case, moving on to study a second or more specific sample can be difficult.
  • The design cannot be used to create conclusions and interpretations that pertain to an entire population because the sampling technique is not randomized. Generalizability from findings is, therefore, limited.
  • Difficult to account for and interpret variation from one sample to another over time, particularly when using qualitative methods of data collection.

Betensky, Rebecca. Harvard University, Course Lecture Note slides; Bovaird, James A. and Kevin A. Kupzyk. "Sequential Design." In Encyclopedia of Research Design . Neil J. Salkind, editor. (Thousand Oaks, CA: Sage, 2010), pp. 1347-1352; Cresswell, John W. Et al. “Advanced Mixed-Methods Research Designs.” In Handbook of Mixed Methods in Social and Behavioral Research . Abbas Tashakkori and Charles Teddle, eds. (Thousand Oaks, CA: Sage, 2003), pp. 209-240; Henry, Gary T. "Sequential Sampling." In The SAGE Encyclopedia of Social Science Research Methods . Michael S. Lewis-Beck, Alan Bryman and Tim Futing Liao, editors. (Thousand Oaks, CA: Sage, 2004), pp. 1027-1028; Nataliya V. Ivankova. “Using Mixed-Methods Sequential Explanatory Design: From Theory to Practice.” Field Methods 18 (February 2006): 3-20; Bovaird, James A. and Kevin A. Kupzyk. “Sequential Design.” In Encyclopedia of Research Design . Neil J. Salkind, ed. Thousand Oaks, CA: Sage, 2010; Sequential Analysis. Wikipedia.

Systematic Review

  • A systematic review synthesizes the findings of multiple studies related to each other by incorporating strategies of analysis and interpretation intended to reduce biases and random errors.
  • The application of critical exploration, evaluation, and synthesis methods separates insignificant, unsound, or redundant research from the most salient and relevant studies worthy of reflection.
  • They can be use to identify, justify, and refine hypotheses, recognize and avoid hidden problems in prior studies, and explain data inconsistencies and conflicts in data.
  • Systematic reviews can be used to help policy makers formulate evidence-based guidelines and regulations.
  • The use of strict, explicit, and pre-determined methods of synthesis, when applied appropriately, provide reliable estimates about the effects of interventions, evaluations, and effects related to the overarching research problem investigated by each study under review.
  • Systematic reviews illuminate where knowledge or thorough understanding of a research problem is lacking and, therefore, can then be used to guide future research.
  • The accepted inclusion of unpublished studies [i.e., grey literature] ensures the broadest possible way to analyze and interpret research on a topic.
  • Results of the synthesis can be generalized and the findings extrapolated into the general population with more validity than most other types of studies .
  • Systematic reviews do not create new knowledge per se; they are a method for synthesizing existing studies about a research problem in order to gain new insights and determine gaps in the literature.
  • The way researchers have carried out their investigations [e.g., the period of time covered, number of participants, sources of data analyzed, etc.] can make it difficult to effectively synthesize studies.
  • The inclusion of unpublished studies can introduce bias into the review because they may not have undergone a rigorous peer-review process prior to publication. Examples may include conference presentations or proceedings, publications from government agencies, white papers, working papers, and internal documents from organizations, and doctoral dissertations and Master's theses.

Denyer, David and David Tranfield. "Producing a Systematic Review." In The Sage Handbook of Organizational Research Methods .  David A. Buchanan and Alan Bryman, editors. ( Thousand Oaks, CA: Sage Publications, 2009), pp. 671-689; Foster, Margaret J. and Sarah T. Jewell, editors. Assembling the Pieces of a Systematic Review: A Guide for Librarians . Lanham, MD: Rowman and Littlefield, 2017; Gough, David, Sandy Oliver, James Thomas, editors. Introduction to Systematic Reviews . 2nd edition. Los Angeles, CA: Sage Publications, 2017; Gopalakrishnan, S. and P. Ganeshkumar. “Systematic Reviews and Meta-analysis: Understanding the Best Evidence in Primary Healthcare.” Journal of Family Medicine and Primary Care 2 (2013): 9-14; Gough, David, James Thomas, and Sandy Oliver. "Clarifying Differences between Review Designs and Methods." Systematic Reviews 1 (2012): 1-9; Khan, Khalid S., Regina Kunz, Jos Kleijnen, and Gerd Antes. “Five Steps to Conducting a Systematic Review.” Journal of the Royal Society of Medicine 96 (2003): 118-121; Mulrow, C. D. “Systematic Reviews: Rationale for Systematic Reviews.” BMJ 309:597 (September 1994); O'Dwyer, Linda C., and Q. Eileen Wafford. "Addressing Challenges with Systematic Review Teams through Effective Communication: A Case Report." Journal of the Medical Library Association 109 (October 2021): 643-647; Okoli, Chitu, and Kira Schabram. "A Guide to Conducting a Systematic Literature Review of Information Systems Research."  Sprouts: Working Papers on Information Systems 10 (2010); Siddaway, Andy P., Alex M. Wood, and Larry V. Hedges. "How to Do a Systematic Review: A Best Practice Guide for Conducting and Reporting Narrative Reviews, Meta-analyses, and Meta-syntheses." Annual Review of Psychology 70 (2019): 747-770; Torgerson, Carole J. “Publication Bias: The Achilles’ Heel of Systematic Reviews?” British Journal of Educational Studies 54 (March 2006): 89-102; Torgerson, Carole. Systematic Reviews . New York: Continuum, 2003.

  • << Previous: Purpose of Guide
  • Next: Design Flaws to Avoid >>
  • Last Updated: May 22, 2024 12:03 PM
  • URL: https://libguides.usc.edu/writingguide
  • En español – ExME
  • Em português – EME

An introduction to different types of study design

Posted on 6th April 2021 by Hadi Abbas

""

Study designs are the set of methods and procedures used to collect and analyze data in a study.

Broadly speaking, there are 2 types of study designs: descriptive studies and analytical studies.

Descriptive studies

  • Describes specific characteristics in a population of interest
  • The most common forms are case reports and case series
  • In a case report, we discuss our experience with the patient’s symptoms, signs, diagnosis, and treatment
  • In a case series, several patients with similar experiences are grouped.

Analytical Studies

Analytical studies are of 2 types: observational and experimental.

Observational studies are studies that we conduct without any intervention or experiment. In those studies, we purely observe the outcomes.  On the other hand, in experimental studies, we conduct experiments and interventions.

Observational studies

Observational studies include many subtypes. Below, I will discuss the most common designs.

Cross-sectional study:

  • This design is transverse where we take a specific sample at a specific time without any follow-up
  • It allows us to calculate the frequency of disease ( p revalence ) or the frequency of a risk factor
  • This design is easy to conduct
  • For example – if we want to know the prevalence of migraine in a population, we can conduct a cross-sectional study whereby we take a sample from the population and calculate the number of patients with migraine headaches.

Cohort study:

  • We conduct this study by comparing two samples from the population: one sample with a risk factor while the other lacks this risk factor
  • It shows us the risk of developing the disease in individuals with the risk factor compared to those without the risk factor ( RR = relative risk )
  • Prospective : we follow the individuals in the future to know who will develop the disease
  • Retrospective : we look to the past to know who developed the disease (e.g. using medical records)
  • This design is the strongest among the observational studies
  • For example – to find out the relative risk of developing chronic obstructive pulmonary disease (COPD) among smokers, we take a sample including smokers and non-smokers. Then, we calculate the number of individuals with COPD among both.

Case-Control Study:

  • We conduct this study by comparing 2 groups: one group with the disease (cases) and another group without the disease (controls)
  • This design is always retrospective
  •  We aim to find out the odds of having a risk factor or an exposure if an individual has a specific disease (Odds ratio)
  •  Relatively easy to conduct
  • For example – we want to study the odds of being a smoker among hypertensive patients compared to normotensive ones. To do so, we choose a group of patients diagnosed with hypertension and another group that serves as the control (normal blood pressure). Then we study their smoking history to find out if there is a correlation.

Experimental Studies

  • Also known as interventional studies
  • Can involve animals and humans
  • Pre-clinical trials involve animals
  • Clinical trials are experimental studies involving humans
  • In clinical trials, we study the effect of an intervention compared to another intervention or placebo. As an example, I have listed the four phases of a drug trial:

I:  We aim to assess the safety of the drug ( is it safe ? )

II: We aim to assess the efficacy of the drug ( does it work ? )

III: We want to know if this drug is better than the old treatment ( is it better ? )

IV: We follow-up to detect long-term side effects ( can it stay in the market ? )

  • In randomized controlled trials, one group of participants receives the control, while the other receives the tested drug/intervention. Those studies are the best way to evaluate the efficacy of a treatment.

Finally, the figure below will help you with your understanding of different types of study designs.

A visual diagram describing the following. Two types of epidemiological studies are descriptive and analytical. Types of descriptive studies are case reports, case series, descriptive surveys. Types of analytical studies are observational or experimental. Observational studies can be cross-sectional, case-control or cohort studies. Types of experimental studies can be lab trials or field trials.

References (pdf)

You may also be interested in the following blogs for further reading:

An introduction to randomized controlled trials

Case-control and cohort studies: a brief overview

Cohort studies: prospective and retrospective designs

Prevalence vs Incidence: what is the difference?

' src=

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on An introduction to different types of study design

' src=

you are amazing one!! if I get you I’m working with you! I’m student from Ethiopian higher education. health sciences student

' src=

Very informative and easy understandable

' src=

You are my kind of doctor. Do not lose sight of your objective.

' src=

Wow very erll explained and easy to understand

' src=

I’m Khamisu Habibu community health officer student from Abubakar Tafawa Balewa university teaching hospital Bauchi, Nigeria, I really appreciate your write up and you have make it clear for the learner. thank you

' src=

well understood,thank you so much

' src=

Well understood…thanks

' src=

Simply explained. Thank You.

' src=

Thanks a lot for this nice informative article which help me to understand different study designs that I felt difficult before

' src=

That’s lovely to hear, Mona, thank you for letting the author know how useful this was. If there are any other particular topics you think would be useful to you, and are not already on the website, please do let us know.

' src=

it is very informative and useful.

thank you statistician

Fabulous to hear, thank you John.

' src=

Thanks for this information

Thanks so much for this information….I have clearly known the types of study design Thanks

That’s so good to hear, Mirembe, thank you for letting the author know.

' src=

Very helpful article!! U have simplified everything for easy understanding

' src=

I’m a health science major currently taking statistics for health care workers…this is a challenging class…thanks for the simified feedback.

That’s good to hear this has helped you. Hopefully you will find some of the other blogs useful too. If you see any topics that are missing from the website, please do let us know!

' src=

Hello. I liked your presentation, the fact that you ranked them clearly is very helpful to understand for people like me who is a novelist researcher. However, I was expecting to read much more about the Experimental studies. So please direct me if you already have or will one day. Thank you

Dear Ay. My sincere apologies for not responding to your comment sooner. You may find it useful to filter the blogs by the topic of ‘Study design and research methods’ – here is a link to that filter: https://s4be.cochrane.org/blog/topic/study-design/ This will cover more detail about experimental studies. Or have a look on our library page for further resources there – you’ll find that on the ‘Resources’ drop down from the home page.

However, if there are specific things you feel you would like to learn about experimental studies, that are missing from the website, it would be great if you could let me know too. Thank you, and best of luck. Emma

' src=

Great job Mr Hadi. I advise you to prepare and study for the Australian Medical Board Exams as soon as you finish your undergrad study in Lebanon. Good luck and hope we can meet sometime in the future. Regards ;)

' src=

You have give a good explaination of what am looking for. However, references am not sure of where to get them from.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Cluster Randomized Trials: Concepts

This blog summarizes the concepts of cluster randomization, and the logistical and statistical considerations while designing a cluster randomized controlled trial.

""

Expertise-based Randomized Controlled Trials

This blog summarizes the concepts of Expertise-based randomized controlled trials with a focus on the advantages and challenges associated with this type of study.

case study research design is a kind of experimental research

A well-designed cohort study can provide powerful results. This blog introduces prospective and retrospective cohort studies, discussing the advantages, disadvantages and use of these type of study designs.

Sacred Heart University Library

Organizing Academic Research Papers: Types of Research Designs

  • Purpose of Guide
  • Design Flaws to Avoid
  • Glossary of Research Terms
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Executive Summary
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tertiary Sources
  • What Is Scholarly vs. Popular?
  • Qualitative Methods
  • Quantitative Methods
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Annotated Bibliography
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • How to Manage Group Projects
  • Multiple Book Review Essay
  • Reviewing Collected Essays
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Research Proposal
  • Acknowledgements

Introduction

Before beginning your paper, you need to decide how you plan to design the study .

The research design refers to the overall strategy that you choose to integrate the different components of the study in a coherent and logical way, thereby, ensuring you will effectively address the research problem; it constitutes the blueprint for the collection, measurement, and analysis of data. Note that your research problem determines the type of design you can use, not the other way around!

General Structure and Writing Style

Action research design, case study design, causal design, cohort design, cross-sectional design, descriptive design, experimental design, exploratory design, historical design, longitudinal design, observational design, philosophical design, sequential design.

Kirshenblatt-Gimblett, Barbara. Part 1, What Is Research Design? The Context of Design. Performance Studies Methods Course syllabus . New York University, Spring 2006; Trochim, William M.K. Research Methods Knowledge Base . 2006.

The function of a research design is to ensure that the evidence obtained enables you to effectively address the research problem as unambiguously as possible. In social sciences research, obtaining evidence relevant to the research problem generally entails specifying the type of evidence needed to test a theory, to evaluate a program, or to accurately describe a phenomenon. However, researchers can often begin their investigations far too early, before they have thought critically about about what information is required to answer the study's research questions. Without attending to these design issues beforehand, the conclusions drawn risk being weak and unconvincing and, consequently, will fail to adequate address the overall research problem.

 Given this, the length and complexity of research designs can vary considerably, but any sound design will do the following things:

  • Identify the research problem clearly and justify its selection,
  • Review previously published literature associated with the problem area,
  • Clearly and explicitly specify hypotheses [i.e., research questions] central to the problem selected,
  • Effectively describe the data which will be necessary for an adequate test of the hypotheses and explain how such data will be obtained, and
  • Describe the methods of analysis which will be applied to the data in determining whether or not the hypotheses are true or false.

Kirshenblatt-Gimblett, Barbara. Part 1, What Is Research Design? The Context of Design. Performance Studies Methods Course syllabus . New Yortk University, Spring 2006.

Definition and Purpose

The essentials of action research design follow a characteristic cycle whereby initially an exploratory stance is adopted, where an understanding of a problem is developed and plans are made for some form of interventionary strategy. Then the intervention is carried out (the action in Action Research) during which time, pertinent observations are collected in various forms. The new interventional strategies are carried out, and the cyclic process repeats, continuing until a sufficient understanding of (or implement able solution for) the problem is achieved. The protocol is iterative or cyclical in nature and is intended to foster deeper understanding of a given situation, starting with conceptualizing and particularizing the problem and moving through several interventions and evaluations.

What do these studies tell you?

  • A collaborative and adaptive research design that lends itself to use in work or community situations.
  • Design focuses on pragmatic and solution-driven research rather than testing theories.
  • When practitioners use action research it has the potential to increase the amount they learn consciously from their experience. The action research cycle can also be regarded as a learning cycle.
  • Action search studies often have direct and obvious relevance to practice.
  • There are no hidden controls or preemption of direction by the researcher.

What these studies don't tell you?

  • It is harder to do than conducting conventional studies because the researcher takes on responsibilities for encouraging change as well as for research.
  • Action research is much harder to write up because you probably can’t use a standard format to report your findings effectively.
  • Personal over-involvement of the researcher may bias research results.
  • The cyclic nature of action research to achieve its twin outcomes of action (e.g. change) and research (e.g. understanding) is time-consuming and complex to conduct.

Gall, Meredith. Educational Research: An Introduction . Chapter 18, Action Research. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007; Kemmis, Stephen and Robin McTaggart. “Participatory Action Research.” In Handbook of Qualitative Research . Norman Denzin and Yvonna S. Locoln, eds. 2nd ed. (Thousand Oaks, CA: SAGE, 2000), pp. 567-605.; Reason, Peter and Hilary Bradbury. Handbook of Action Research: Participative Inquiry and Practice . Thousand Oaks, CA: SAGE, 2001.

A case study is an in-depth study of a particular research problem rather than a sweeping statistical survey. It is often used to narrow down a very broad field of research into one or a few easily researchable examples. The case study research design is also useful for testing whether a specific theory and model actually applies to phenomena in the real world. It is a useful design when not much is known about a phenomenon.

  • Approach excels at bringing us to an understanding of a complex issue through detailed contextual analysis of a limited number of events or conditions and their relationships.
  • A researcher using a case study design can apply a vaiety of methodologies and rely on a variety of sources to investigate a research problem.
  • Design can extend experience or add strength to what is already known through previous research.
  • Social scientists, in particular, make wide use of this research design to examine contemporary real-life situations and provide the basis for the application of concepts and theories and extension of methods.
  • The design can provide detailed descriptions of specific and rare cases.
  • A single or small number of cases offers little basis for establishing reliability or to generalize the findings to a wider population of people, places, or things.
  • The intense exposure to study of the case may bias a researcher's interpretation of the findings.
  • Design does not facilitate assessment of cause and effect relationships.
  • Vital information may be missing, making the case hard to interpret.
  • The case may not be representative or typical of the larger problem being investigated.
  • If the criteria for selecting a case is because it represents a very unusual or unique phenomenon or problem for study, then your intepretation of the findings can only apply to that particular case.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 4, Flexible Methods: Case Study Design. 2nd ed. New York: Columbia University Press, 1999; Stake, Robert E. The Art of Case Study Research . Thousand Oaks, CA: SAGE, 1995; Yin, Robert K. Case Study Research: Design and Theory . Applied Social Research Methods Series, no. 5. 3rd ed. Thousand Oaks, CA: SAGE, 2003.

Causality studies may be thought of as understanding a phenomenon in terms of conditional statements in the form, “If X, then Y.” This type of research is used to measure what impact a specific change will have on existing norms and assumptions. Most social scientists seek causal explanations that reflect tests of hypotheses. Causal effect (nomothetic perspective) occurs when variation in one phenomenon, an independent variable, leads to or results, on average, in variation in another phenomenon, the dependent variable.

Conditions necessary for determining causality:

  • Empirical association--a valid conclusion is based on finding an association between the independent variable and the dependent variable.
  • Appropriate time order--to conclude that causation was involved, one must see that cases were exposed to variation in the independent variable before variation in the dependent variable.
  • Nonspuriousness--a relationship between two variables that is not due to variation in a third variable.
  • Causality research designs helps researchers understand why the world works the way it does through the process of proving a causal link between variables and eliminating other possibilities.
  • Replication is possible.
  • There is greater confidence the study has internal validity due to the systematic subject selection and equity of groups being compared.
  • Not all relationships are casual! The possibility always exists that, by sheer coincidence, two unrelated events appear to be related [e.g., Punxatawney Phil could accurately predict the duration of Winter for five consecutive years but, the fact remains, he's just a big, furry rodent].
  • Conclusions about causal relationships are difficult to determine due to a variety of extraneous and confounding variables that exist in a social environment. This means causality can only be inferred, never proven.
  • If two variables are correlated, the cause must come before the effect. However, even though two variables might be causally related, it can sometimes be difficult to determine which variable comes first and therefore to establish which variable is the actual cause and which is the  actual effect.

Bachman, Ronet. The Practice of Research in Criminology and Criminal Justice . Chapter 5, Causation and Research Designs. 3rd ed.  Thousand Oaks, CA: Pine Forge Press, 2007; Causal Research Design: Experimentation. Anonymous SlideShare Presentation ; Gall, Meredith. Educational Research: An Introduction . Chapter 11, Nonexperimental Research: Correlational Designs. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007; Trochim, William M.K. Research Methods Knowledge Base . 2006.

Often used in the medical sciences, but also found in the applied social sciences, a cohort study generally refers to a study conducted over a period of time involving members of a population which the subject or representative member comes from, and who are united by some commonality or similarity. Using a quantitative framework, a cohort study makes note of statistical occurrence within a specialized subgroup, united by same or similar characteristics that are relevant to the research problem being investigated, r ather than studying statistical occurrence within the general population. Using a qualitative framework, cohort studies generally gather data using methods of observation. Cohorts can be either "open" or "closed."

  • Open Cohort Studies [dynamic populations, such as the population of Los Angeles] involve a population that is defined just by the state of being a part of the study in question (and being monitored for the outcome). Date of entry and exit from the study is individually defined, therefore, the size of the study population is not constant. In open cohort studies, researchers can only calculate rate based data, such as, incidence rates and variants thereof.
  • Closed Cohort Studies [static populations, such as patients entered into a clinical trial] involve participants who enter into the study at one defining point in time and where it is presumed that no new participants can enter the cohort. Given this, the number of study participants remains constant (or can only decrease).
  • The use of cohorts is often mandatory because a randomized control study may be unethical. For example, you cannot deliberately expose people to asbestos, you can only study its effects on those who have already been exposed. Research that measures risk factors  often relies on cohort designs.
  • Because cohort studies measure potential causes before the outcome has occurred, they can demonstrate that these “causes” preceded the outcome, thereby avoiding the debate as to which is the cause and which is the effect.
  • Cohort analysis is highly flexible and can provide insight into effects over time and related to a variety of different types of changes [e.g., social, cultural, political, economic, etc.].
  • Either original data or secondary data can be used in this design.
  • In cases where a comparative analysis of two cohorts is made [e.g., studying the effects of one group exposed to asbestos and one that has not], a researcher cannot control for all other factors that might differ between the two groups. These factors are known as confounding variables.
  • Cohort studies can end up taking a long time to complete if the researcher must wait for the conditions of interest to develop within the group. This also increases the chance that key variables change during the course of the study, potentially impacting the validity of the findings.
  • Because of the lack of randominization in the cohort design, its external validity is lower than that of study designs where the researcher randomly assigns participants.

Healy P, Devane D. “Methodological Considerations in Cohort Study Designs.” Nurse Researcher 18 (2011): 32-36;  Levin, Kate Ann. Study Design IV: Cohort Studies. Evidence-Based Dentistry 7 (2003): 51–52; Study Design 101 . Himmelfarb Health Sciences Library. George Washington University, November 2011; Cohort Study . Wikipedia.

Cross-sectional research designs have three distinctive features: no time dimension, a reliance on existing differences rather than change following intervention; and, groups are selected based on existing differences rather than random allocation. The cross-sectional design can only measure diffrerences between or from among a variety of people, subjects, or phenomena rather than change. As such, researchers using this design can only employ a relative passive approach to making causal inferences based on findings.

  • Cross-sectional studies provide a 'snapshot' of the outcome and the characteristics associated with it, at a specific point in time.
  • Unlike the experimental design where there is an active intervention by the researcher to produce and measure change or to create differences, cross-sectional designs focus on studying and drawing inferences from existing differences between people, subjects, or phenomena.
  • Entails collecting data at and concerning one point in time. While longitudinal studies involve taking multiple measures over an extended period of time, cross-sectional research is focused on finding relationships between variables at one moment in time.
  • Groups identified for study are purposely selected based upon existing differences in the sample rather than seeking random sampling.
  • Cross-section studies are capable of using data from a large number of subjects and, unlike observational studies, is not geographically bound.
  • Can estimate prevalence of an outcome of interest because the sample is usually taken from the whole population.
  • Because cross-sectional designs generally use survey techniques to gather data, they are relatively inexpensive and take up little time to conduct.
  • Finding people, subjects, or phenomena to study that are very similar except in one specific variable can be difficult.
  • Results are static and time bound and, therefore, give no indication of a sequence of events or reveal historical contexts.
  • Studies cannot be utilized to establish cause and effect relationships.
  • Provide only a snapshot of analysis so there is always the possibility that a study could have differing results if another time-frame had been chosen.
  • There is no follow up to the findings.

Hall, John. “Cross-Sectional Survey Design.” In Encyclopedia of Survey Research Methods. Paul J. Lavrakas, ed. (Thousand Oaks, CA: Sage, 2008), pp. 173-174; Helen Barratt, Maria Kirwan. Cross-Sectional Studies: Design, Application, Strengths and Weaknesses of Cross-Sectional Studies . Healthknowledge, 2009. Cross-Sectional Study . Wikipedia.

Descriptive research designs help provide answers to the questions of who, what, when, where, and how associated with a particular research problem; a descriptive study cannot conclusively ascertain answers to why. Descriptive research is used to obtain information concerning the current status of the phenomena and to describe "what exists" with respect to variables or conditions in a situation.

  • The subject is being observed in a completely natural and unchanged natural environment. True experiments, whilst giving analyzable data, often adversely influence the normal behavior of the subject.
  • Descriptive research is often used as a pre-cursor to more quantitatively research designs, the general overview giving some valuable pointers as to what variables are worth testing quantitatively.
  • If the limitations are understood, they can be a useful tool in developing a more focused study.
  • Descriptive studies can yield rich data that lead to important recommendations.
  • Appoach collects a large amount of data for detailed analysis.
  • The results from a descriptive research can not be used to discover a definitive answer or to disprove a hypothesis.
  • Because descriptive designs often utilize observational methods [as opposed to quantitative methods], the results cannot be replicated.
  • The descriptive function of research is heavily dependent on instrumentation for measurement and observation.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 5, Flexible Methods: Descriptive Research. 2nd ed. New York: Columbia University Press, 1999;  McNabb, Connie. Descriptive Research Methodologies . Powerpoint Presentation; Shuttleworth, Martyn. Descriptive Research Design , September 26, 2008. Explorable.com website.

A blueprint of the procedure that enables the researcher to maintain control over all factors that may affect the result of an experiment. In doing this, the researcher attempts to determine or predict what may occur. Experimental Research is often used where there is time priority in a causal relationship (cause precedes effect), there is consistency in a causal relationship (a cause will always lead to the same effect), and the magnitude of the correlation is great. The classic experimental design specifies an experimental group and a control group. The independent variable is administered to the experimental group and not to the control group, and both groups are measured on the same dependent variable. Subsequent experimental designs have used more groups and more measurements over longer periods. True experiments must have control, randomization, and manipulation.

  • Experimental research allows the researcher to control the situation. In so doing, it allows researchers to answer the question, “what causes something to occur?”
  • Permits the researcher to identify cause and effect relationships between variables and to distinguish placebo effects from treatment effects.
  • Experimental research designs support the ability to limit alternative explanations and to infer direct causal relationships in the study.
  • Approach provides the highest level of evidence for single studies.
  • The design is artificial, and results may not generalize well to the real world.
  • The artificial settings of experiments may alter subject behaviors or responses.
  • Experimental designs can be costly if special equipment or facilities are needed.
  • Some research problems cannot be studied using an experiment because of ethical or technical reasons.
  • Difficult to apply ethnographic and other qualitative methods to  experimental designed research studies.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 7, Flexible Methods: Experimental Research. 2nd ed. New York: Columbia University Press, 1999; Chapter 2: Research Design, Experimental Designs . School of Psychology, University of New England, 2000; Experimental Research. Research Methods by Dummies. Department of Psychology. California State University, Fresno, 2006; Trochim, William M.K. Experimental Design . Research Methods Knowledge Base. 2006; Rasool, Shafqat. Experimental Research . Slideshare presentation.

An exploratory design is conducted about a research problem when there are few or no earlier studies to refer to. The focus is on gaining insights and familiarity for later investigation or undertaken when problems are in a preliminary stage of investigation.

The goals of exploratory research are intended to produce the following possible insights:

  • Familiarity with basic details, settings and concerns.
  • Well grounded picture of the situation being developed.
  • Generation of new ideas and assumption, development of tentative theories or hypotheses.
  • Determination about whether a study is feasible in the future.
  • Issues get refined for more systematic investigation and formulation of new research questions.
  • Direction for future research and techniques get developed.
  • Design is a useful approach for gaining background information on a particular topic.
  • Exploratory research is flexible and can address research questions of all types (what, why, how).
  • Provides an opportunity to define new terms and clarify existing concepts.
  • Exploratory research is often used to generate formal hypotheses and develop more precise research problems.
  • Exploratory studies help establish research priorities.
  • Exploratory research generally utilizes small sample sizes and, thus, findings are typically not generalizable to the population at large.
  • The exploratory nature of the research inhibits an ability to make definitive conclusions about the findings.
  • The research process underpinning exploratory studies is flexible but often unstructured, leading to only tentative results that have limited value in decision-making.
  • Design lacks rigorous standards applied to methods of data gathering and analysis because one of the areas for exploration could be to determine what method or methodologies could best fit the research problem.

Cuthill, Michael. “Exploratory Research: Citizen Participation, Local Government, and Sustainable Development in Australia.” Sustainable Development 10 (2002): 79-89; Taylor, P. J., G. Catalano, and D.R.F. Walker. “Exploratory Analysis of the World City Network.” Urban Studies 39 (December 2002): 2377-2394; Exploratory Research . Wikipedia.

The purpose of a historical research design is to collect, verify, and synthesize evidence from the past to establish facts that defend or refute your hypothesis. It uses secondary sources and a variety of primary documentary evidence, such as, logs, diaries, official records, reports, archives, and non-textual information [maps, pictures, audio and visual recordings]. The limitation is that the sources must be both authentic and valid.

  • The historical research design is unobtrusive; the act of research does not affect the results of the study.
  • The historical approach is well suited for trend analysis.
  • Historical records can add important contextual background required to more fully understand and interpret a research problem.
  • There is no possibility of researcher-subject interaction that could affect the findings.
  • Historical sources can be used over and over to study different research problems or to replicate a previous study.
  • The ability to fulfill the aims of your research are directly related to the amount and quality of documentation available to understand the research problem.
  • Since historical research relies on data from the past, there is no way to manipulate it to control for contemporary contexts.
  • Interpreting historical sources can be very time consuming.
  • The sources of historical materials must be archived consistentally to ensure access.
  • Original authors bring their own perspectives and biases to the interpretation of past events and these biases are more difficult to ascertain in historical resources.
  • Due to the lack of control over external variables, historical research is very weak with regard to the demands of internal validity.
  • It rare that the entirety of historical documentation needed to fully address a research problem is available for interpretation, therefore, gaps need to be acknowledged.

Savitt, Ronald. “Historical Research in Marketing.” Journal of Marketing 44 (Autumn, 1980): 52-58;  Gall, Meredith. Educational Research: An Introduction . Chapter 16, Historical Research. 8th ed. Boston, MA: Pearson/Allyn and Bacon, 2007.

A longitudinal study follows the same sample over time and makes repeated observations. With longitudinal surveys, for example, the same group of people is interviewed at regular intervals, enabling researchers to track changes over time and to relate them to variables that might explain why the changes occur. Longitudinal research designs describe patterns of change and help establish the direction and magnitude of causal relationships. Measurements are taken on each variable over two or more distinct time periods. This allows the researcher to measure change in variables over time. It is a type of observational study and is sometimes referred to as a panel study.

  • Longitudinal data allow the analysis of duration of a particular phenomenon.
  • Enables survey researchers to get close to the kinds of causal explanations usually attainable only with experiments.
  • The design permits the measurement of differences or change in a variable from one period to another [i.e., the description of patterns of change over time].
  • Longitudinal studies facilitate the prediction of future outcomes based upon earlier factors.
  • The data collection method may change over time.
  • Maintaining the integrity of the original sample can be difficult over an extended period of time.
  • It can be difficult to show more than one variable at a time.
  • This design often needs qualitative research to explain fluctuations in the data.
  • A longitudinal research design assumes present trends will continue unchanged.
  • It can take a long period of time to gather results.
  • There is a need to have a large sample size and accurate sampling to reach representativness.

Anastas, Jeane W. Research Design for Social Work and the Human Services . Chapter 6, Flexible Methods: Relational and Longitudinal Research. 2nd ed. New York: Columbia University Press, 1999; Kalaian, Sema A. and Rafa M. Kasim. "Longitudinal Studies." In Encyclopedia of Survey Research Methods . Paul J. Lavrakas, ed. (Thousand Oaks, CA: Sage, 2008), pp. 440-441; Ployhart, Robert E. and Robert J. Vandenberg. "Longitudinal Research: The Theory, Design, and Analysis of Change.” Journal of Management 36 (January 2010): 94-120; Longitudinal Study . Wikipedia.

This type of research design draws a conclusion by comparing subjects against a control group, in cases where the researcher has no control over the experiment. There are two general types of observational designs. In direct observations, people know that you are watching them. Unobtrusive measures involve any method for studying behavior where individuals do not know they are being observed. An observational study allows a useful insight into a phenomenon and avoids the ethical and practical difficulties of setting up a large and cumbersome research project.

  • Observational studies are usually flexible and do not necessarily need to be structured around a hypothesis about what you expect to observe (data is emergent rather than pre-existing).
  • The researcher is able to collect a depth of information about a particular behavior.
  • Can reveal interrelationships among multifaceted dimensions of group interactions.
  • You can generalize your results to real life situations.
  • Observational research is useful for discovering what variables may be important before applying other methods like experiments.
  • Observation researchd esigns account for the complexity of group behaviors.
  • Reliability of data is low because seeing behaviors occur over and over again may be a time consuming task and difficult to replicate.
  • In observational research, findings may only reflect a unique sample population and, thus, cannot be generalized to other groups.
  • There can be problems with bias as the researcher may only "see what they want to see."
  • There is no possiblility to determine "cause and effect" relationships since nothing is manipulated.
  • Sources or subjects may not all be equally credible.
  • Any group that is studied is altered to some degree by the very presence of the researcher, therefore, skewing to some degree any data collected (the Heisenburg Uncertainty Principle).

Atkinson, Paul and Martyn Hammersley. “Ethnography and Participant Observation.” In Handbook of Qualitative Research . Norman K. Denzin and Yvonna S. Lincoln, eds. (Thousand Oaks, CA: Sage, 1994), pp. 248-261; Observational Research. Research Methods by Dummies. Department of Psychology. California State University, Fresno, 2006; Patton Michael Quinn. Qualitiative Research and Evaluation Methods . Chapter 6, Fieldwork Strategies and Observational Methods. 3rd ed. Thousand Oaks, CA: Sage, 2002; Rosenbaum, Paul R. Design of Observational Studies . New York: Springer, 2010.

Understood more as an broad approach to examining a research problem than a methodological design, philosophical analysis and argumentation is intended to challenge deeply embedded, often intractable, assumptions underpinning an area of study. This approach uses the tools of argumentation derived from philosophical traditions, concepts, models, and theories to critically explore and challenge, for example, the relevance of logic and evidence in academic debates, to analyze arguments about fundamental issues, or to discuss the root of existing discourse about a research problem. These overarching tools of analysis can be framed in three ways:

  • Ontology -- the study that describes the nature of reality; for example, what is real and what is not, what is fundamental and what is derivative?
  • Epistemology -- the study that explores the nature of knowledge; for example, on what does knowledge and understanding depend upon and how can we be certain of what we know?
  • Axiology -- the study of values; for example, what values does an individual or group hold and why? How are values related to interest, desire, will, experience, and means-to-end? And, what is the difference between a matter of fact and a matter of value?
  • Can provide a basis for applying ethical decision-making to practice.
  • Functions as a means of gaining greater self-understanding and self-knowledge about the purposes of research.
  • Brings clarity to general guiding practices and principles of an individual or group.
  • Philosophy informs methodology.
  • Refine concepts and theories that are invoked in relatively unreflective modes of thought and discourse.
  • Beyond methodology, philosophy also informs critical thinking about epistemology and the structure of reality (metaphysics).
  • Offers clarity and definition to the practical and theoretical uses of terms, concepts, and ideas.
  • Limited application to specific research problems [answering the "So What?" question in social science research].
  • Analysis can be abstract, argumentative, and limited in its practical application to real-life issues.
  • While a philosophical analysis may render problematic that which was once simple or taken-for-granted, the writing can be dense and subject to unnecessary jargon, overstatement, and/or excessive quotation and documentation.
  • There are limitations in the use of metaphor as a vehicle of philosophical analysis.
  • There can be analytical difficulties in moving from philosophy to advocacy and between abstract thought and application to the phenomenal world.

Chapter 4, Research Methodology and Design . Unisa Institutional Repository (UnisaIR), University of South Africa;  Labaree, Robert V. and Ross Scimeca. “The Philosophical Problem of Truth in Librarianship.” The Library Quarterly 78 (January 2008): 43-70; Maykut, Pamela S. Beginning Qualitative Research: A Philosophic and Practical Guide . Washington, D.C.: Falmer Press, 1994; Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, CSLI, Stanford University, 2013.

  • The researcher has a limitless option when it comes to sample size and the sampling schedule.
  • Due to the repetitive nature of this research design, minor changes and adjustments can be done during the initial parts of the study to correct and hone the research method. Useful design for exploratory studies.
  • There is very little effort on the part of the researcher when performing this technique. It is generally not expensive, time consuming, or workforce extensive.
  • Because the study is conducted serially, the results of one sample are known before the next sample is taken and analyzed.
  • The sampling method is not representative of the entire population. The only possibility of approaching representativeness is when the researcher chooses to use a very large sample size significant enough to represent a significant portion of the entire population. In this case, moving on to study a second or more sample can be difficult.
  • Because the sampling technique is not randomized, the design cannot be used to create conclusions and interpretations that pertain to an entire population. Generalizability from findings is limited.
  • Difficult to account for and interpret variation from one sample to another over time, particularly when using qualitative methods of data collection.

Rebecca Betensky, Harvard University, Course Lecture Note slides ; Cresswell, John W. Et al. “Advanced Mixed-Methods Research Designs.” In Handbook of Mixed Methods in Social and Behavioral Research . Abbas Tashakkori and Charles Teddle, eds. (Thousand Oaks, CA: Sage, 2003), pp. 209-240; Nataliya V. Ivankova. “Using Mixed-Methods Sequential Explanatory Design: From Theory to Practice.” Field Methods 18 (February 2006): 3-20; Bovaird, James A. and Kevin A. Kupzyk. “Sequential Design.” In Encyclopedia of Research Design . Neil J. Salkind, ed. Thousand Oaks, CA: Sage, 2010; Sequential Analysis . Wikipedia.  

  • << Previous: Purpose of Guide
  • Next: Design Flaws to Avoid >>
  • Last Updated: Jul 18, 2023 11:58 AM
  • URL: https://library.sacredheart.edu/c.php?g=29803
  • QuickSearch
  • Library Catalog
  • Databases A-Z
  • Publication Finder
  • Course Reserves
  • Citation Linker
  • Digital Commons
  • Our Website

Research Support

  • Ask a Librarian
  • Appointments
  • Interlibrary Loan (ILL)
  • Research Guides
  • Databases by Subject
  • Citation Help

Using the Library

  • Reserve a Group Study Room
  • Renew Books
  • Honors Study Rooms
  • Off-Campus Access
  • Library Policies
  • Library Technology

User Information

  • Grad Students
  • Online Students
  • COVID-19 Updates
  • Staff Directory
  • News & Announcements
  • Library Newsletter

My Accounts

  • Interlibrary Loan
  • Staff Site Login

Sacred Heart University

FIND US ON  

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

5 Research design

Research design is a comprehensive plan for data collection in an empirical research project. It is a ‘blueprint’ for empirical research aimed at answering specific research questions or testing specific hypotheses, and must specify at least three processes: the data collection process, the instrument development process, and the sampling process. The instrument development and sampling processes are described in the next two chapters, and the data collection process—which is often loosely called ‘research design’—is introduced in this chapter and is described in further detail in Chapters 9–12.

Broadly speaking, data collection methods can be grouped into two categories: positivist and interpretive. Positivist methods , such as laboratory experiments and survey research, are aimed at theory (or hypotheses) testing, while interpretive methods, such as action research and ethnography, are aimed at theory building. Positivist methods employ a deductive approach to research, starting with a theory and testing theoretical postulates using empirical data. In contrast, interpretive methods employ an inductive approach that starts with data and tries to derive a theory about the phenomenon of interest from the observed data. Often times, these methods are incorrectly equated with quantitative and qualitative research. Quantitative and qualitative methods refers to the type of data being collected—quantitative data involve numeric scores, metrics, and so on, while qualitative data includes interviews, observations, and so forth—and analysed (i.e., using quantitative techniques such as regression or qualitative techniques such as coding). Positivist research uses predominantly quantitative data, but can also use qualitative data. Interpretive research relies heavily on qualitative data, but can sometimes benefit from including quantitative data as well. Sometimes, joint use of qualitative and quantitative data may help generate unique insight into a complex social phenomenon that is not available from either type of data alone, and hence, mixed-mode designs that combine qualitative and quantitative data are often highly desirable.

Key attributes of a research design

The quality of research designs can be defined in terms of four key design attributes: internal validity, external validity, construct validity, and statistical conclusion validity.

Internal validity , also called causality, examines whether the observed change in a dependent variable is indeed caused by a corresponding change in a hypothesised independent variable, and not by variables extraneous to the research context. Causality requires three conditions: covariation of cause and effect (i.e., if cause happens, then effect also happens; if cause does not happen, effect does not happen), temporal precedence (cause must precede effect in time), and spurious correlation, or there is no plausible alternative explanation for the change. Certain research designs, such as laboratory experiments, are strong in internal validity by virtue of their ability to manipulate the independent variable (cause) via a treatment and observe the effect (dependent variable) of that treatment after a certain point in time, while controlling for the effects of extraneous variables. Other designs, such as field surveys, are poor in internal validity because of their inability to manipulate the independent variable (cause), and because cause and effect are measured at the same point in time which defeats temporal precedence making it equally likely that the expected effect might have influenced the expected cause rather than the reverse. Although higher in internal validity compared to other methods, laboratory experiments are by no means immune to threats of internal validity, and are susceptible to history, testing, instrumentation, regression, and other threats that are discussed later in the chapter on experimental designs. Nonetheless, different research designs vary considerably in their respective level of internal validity.

External validity or generalisability refers to whether the observed associations can be generalised from the sample to the population (population validity), or to other people, organisations, contexts, or time (ecological validity). For instance, can results drawn from a sample of financial firms in the United States be generalised to the population of financial firms (population validity) or to other firms within the United States (ecological validity)? Survey research, where data is sourced from a wide variety of individuals, firms, or other units of analysis, tends to have broader generalisability than laboratory experiments where treatments and extraneous variables are more controlled. The variation in internal and external validity for a wide range of research designs is shown in Figure 5.1.

Internal and external validity

Some researchers claim that there is a trade-off between internal and external validity—higher external validity can come only at the cost of internal validity and vice versa. But this is not always the case. Research designs such as field experiments, longitudinal field surveys, and multiple case studies have higher degrees of both internal and external validities. Personally, I prefer research designs that have reasonable degrees of both internal and external validities, i.e., those that fall within the cone of validity shown in Figure 5.1. But this should not suggest that designs outside this cone are any less useful or valuable. Researchers’ choice of designs are ultimately a matter of their personal preference and competence, and the level of internal and external validity they desire.

Construct validity examines how well a given measurement scale is measuring the theoretical construct that it is expected to measure. Many constructs used in social science research such as empathy, resistance to change, and organisational learning are difficult to define, much less measure. For instance, construct validity must ensure that a measure of empathy is indeed measuring empathy and not compassion, which may be difficult since these constructs are somewhat similar in meaning. Construct validity is assessed in positivist research based on correlational or factor analysis of pilot test data, as described in the next chapter.

Statistical conclusion validity examines the extent to which conclusions derived using a statistical procedure are valid. For example, it examines whether the right statistical method was used for hypotheses testing, whether the variables used meet the assumptions of that statistical test (such as sample size or distributional requirements), and so forth. Because interpretive research designs do not employ statistical tests, statistical conclusion validity is not applicable for such analysis. The different kinds of validity and where they exist at the theoretical/empirical levels are illustrated in Figure 5.2.

Different types of validity in scientific research

Improving internal and external validity

The best research designs are those that can ensure high levels of internal and external validity. Such designs would guard against spurious correlations, inspire greater faith in the hypotheses testing, and ensure that the results drawn from a small sample are generalisable to the population at large. Controls are required to ensure internal validity (causality) of research designs, and can be accomplished in five ways: manipulation, elimination, inclusion, and statistical control, and randomisation.

In manipulation , the researcher manipulates the independent variables in one or more levels (called ‘treatments’), and compares the effects of the treatments against a control group where subjects do not receive the treatment. Treatments may include a new drug or different dosage of drug (for treating a medical condition), a teaching style (for students), and so forth. This type of control is achieved in experimental or quasi-experimental designs, but not in non-experimental designs such as surveys. Note that if subjects cannot distinguish adequately between different levels of treatment manipulations, their responses across treatments may not be different, and manipulation would fail.

The elimination technique relies on eliminating extraneous variables by holding them constant across treatments, such as by restricting the study to a single gender or a single socioeconomic status. In the inclusion technique, the role of extraneous variables is considered by including them in the research design and separately estimating their effects on the dependent variable, such as via factorial designs where one factor is gender (male versus female). Such technique allows for greater generalisability, but also requires substantially larger samples. In statistical control , extraneous variables are measured and used as covariates during the statistical testing process.

Finally, the randomisation technique is aimed at cancelling out the effects of extraneous variables through a process of random sampling, if it can be assured that these effects are of a random (non-systematic) nature. Two types of randomisation are: random selection , where a sample is selected randomly from a population, and random assignment , where subjects selected in a non-random manner are randomly assigned to treatment groups.

Randomisation also ensures external validity, allowing inferences drawn from the sample to be generalised to the population from which the sample is drawn. Note that random assignment is mandatory when random selection is not possible because of resource or access constraints. However, generalisability across populations is harder to ascertain since populations may differ on multiple dimensions and you can only control for a few of those dimensions.

Popular research designs

As noted earlier, research designs can be classified into two categories—positivist and interpretive—depending on the goal of the research. Positivist designs are meant for theory testing, while interpretive designs are meant for theory building. Positivist designs seek generalised patterns based on an objective view of reality, while interpretive designs seek subjective interpretations of social phenomena from the perspectives of the subjects involved. Some popular examples of positivist designs include laboratory experiments, field experiments, field surveys, secondary data analysis, and case research, while examples of interpretive designs include case research, phenomenology, and ethnography. Note that case research can be used for theory building or theory testing, though not at the same time. Not all techniques are suited for all kinds of scientific research. Some techniques such as focus groups are best suited for exploratory research, others such as ethnography are best for descriptive research, and still others such as laboratory experiments are ideal for explanatory research. Following are brief descriptions of some of these designs. Additional details are provided in Chapters 9–12.

Experimental studies are those that are intended to test cause-effect relationships (hypotheses) in a tightly controlled setting by separating the cause from the effect in time, administering the cause to one group of subjects (the ‘treatment group’) but not to another group (‘control group’), and observing how the mean effects vary between subjects in these two groups. For instance, if we design a laboratory experiment to test the efficacy of a new drug in treating a certain ailment, we can get a random sample of people afflicted with that ailment, randomly assign them to one of two groups (treatment and control groups), administer the drug to subjects in the treatment group, but only give a placebo (e.g., a sugar pill with no medicinal value) to subjects in the control group. More complex designs may include multiple treatment groups, such as low versus high dosage of the drug or combining drug administration with dietary interventions. In a true experimental design , subjects must be randomly assigned to each group. If random assignment is not followed, then the design becomes quasi-experimental . Experiments can be conducted in an artificial or laboratory setting such as at a university (laboratory experiments) or in field settings such as in an organisation where the phenomenon of interest is actually occurring (field experiments). Laboratory experiments allow the researcher to isolate the variables of interest and control for extraneous variables, which may not be possible in field experiments. Hence, inferences drawn from laboratory experiments tend to be stronger in internal validity, but those from field experiments tend to be stronger in external validity. Experimental data is analysed using quantitative statistical techniques. The primary strength of the experimental design is its strong internal validity due to its ability to isolate, control, and intensively examine a small number of variables, while its primary weakness is limited external generalisability since real life is often more complex (i.e., involving more extraneous variables) than contrived lab settings. Furthermore, if the research does not identify ex ante relevant extraneous variables and control for such variables, such lack of controls may hurt internal validity and may lead to spurious correlations.

Field surveys are non-experimental designs that do not control for or manipulate independent variables or treatments, but measure these variables and test their effects using statistical methods. Field surveys capture snapshots of practices, beliefs, or situations from a random sample of subjects in field settings through a survey questionnaire or less frequently, through a structured interview. In cross-sectional field surveys , independent and dependent variables are measured at the same point in time (e.g., using a single questionnaire), while in longitudinal field surveys , dependent variables are measured at a later point in time than the independent variables. The strengths of field surveys are their external validity (since data is collected in field settings), their ability to capture and control for a large number of variables, and their ability to study a problem from multiple perspectives or using multiple theories. However, because of their non-temporal nature, internal validity (cause-effect relationships) are difficult to infer, and surveys may be subject to respondent biases (e.g., subjects may provide a ‘socially desirable’ response rather than their true response) which further hurts internal validity.

Secondary data analysis is an analysis of data that has previously been collected and tabulated by other sources. Such data may include data from government agencies such as employment statistics from the U.S. Bureau of Labor Services or development statistics by countries from the United Nations Development Program, data collected by other researchers (often used in meta-analytic studies), or publicly available third-party data, such as financial data from stock markets or real-time auction data from eBay. This is in contrast to most other research designs where collecting primary data for research is part of the researcher’s job. Secondary data analysis may be an effective means of research where primary data collection is too costly or infeasible, and secondary data is available at a level of analysis suitable for answering the researcher’s questions. The limitations of this design are that the data might not have been collected in a systematic or scientific manner and hence unsuitable for scientific research, since the data was collected for a presumably different purpose, they may not adequately address the research questions of interest to the researcher, and interval validity is problematic if the temporal precedence between cause and effect is unclear.

Case research is an in-depth investigation of a problem in one or more real-life settings (case sites) over an extended period of time. Data may be collected using a combination of interviews, personal observations, and internal or external documents. Case studies can be positivist in nature (for hypotheses testing) or interpretive (for theory building). The strength of this research method is its ability to discover a wide variety of social, cultural, and political factors potentially related to the phenomenon of interest that may not be known in advance. Analysis tends to be qualitative in nature, but heavily contextualised and nuanced. However, interpretation of findings may depend on the observational and integrative ability of the researcher, lack of control may make it difficult to establish causality, and findings from a single case site may not be readily generalised to other case sites. Generalisability can be improved by replicating and comparing the analysis in other case sites in a multiple case design .

Focus group research is a type of research that involves bringing in a small group of subjects (typically six to ten people) at one location, and having them discuss a phenomenon of interest for a period of one and a half to two hours. The discussion is moderated and led by a trained facilitator, who sets the agenda and poses an initial set of questions for participants, makes sure that the ideas and experiences of all participants are represented, and attempts to build a holistic understanding of the problem situation based on participants’ comments and experiences. Internal validity cannot be established due to lack of controls and the findings may not be generalised to other settings because of the small sample size. Hence, focus groups are not generally used for explanatory or descriptive research, but are more suited for exploratory research.

Action research assumes that complex social phenomena are best understood by introducing interventions or ‘actions’ into those phenomena and observing the effects of those actions. In this method, the researcher is embedded within a social context such as an organisation and initiates an action—such as new organisational procedures or new technologies—in response to a real problem such as declining profitability or operational bottlenecks. The researcher’s choice of actions must be based on theory, which should explain why and how such actions may cause the desired change. The researcher then observes the results of that action, modifying it as necessary, while simultaneously learning from the action and generating theoretical insights about the target problem and interventions. The initial theory is validated by the extent to which the chosen action successfully solves the target problem. Simultaneous problem solving and insight generation is the central feature that distinguishes action research from all other research methods, and hence, action research is an excellent method for bridging research and practice. This method is also suited for studying unique social problems that cannot be replicated outside that context, but it is also subject to researcher bias and subjectivity, and the generalisability of findings is often restricted to the context where the study was conducted.

Ethnography is an interpretive research design inspired by anthropology that emphasises that research phenomenon must be studied within the context of its culture. The researcher is deeply immersed in a certain culture over an extended period of time—eight months to two years—and during that period, engages, observes, and records the daily life of the studied culture, and theorises about the evolution and behaviours in that culture. Data is collected primarily via observational techniques, formal and informal interaction with participants in that culture, and personal field notes, while data analysis involves ‘sense-making’. The researcher must narrate her experience in great detail so that readers may experience that same culture without necessarily being there. The advantages of this approach are its sensitiveness to the context, the rich and nuanced understanding it generates, and minimal respondent bias. However, this is also an extremely time and resource-intensive approach, and findings are specific to a given culture and less generalisable to other cultures.

Selecting research designs

Given the above multitude of research designs, which design should researchers choose for their research? Generally speaking, researchers tend to select those research designs that they are most comfortable with and feel most competent to handle, but ideally, the choice should depend on the nature of the research phenomenon being studied. In the preliminary phases of research, when the research problem is unclear and the researcher wants to scope out the nature and extent of a certain research problem, a focus group (for an individual unit of analysis) or a case study (for an organisational unit of analysis) is an ideal strategy for exploratory research. As one delves further into the research domain, but finds that there are no good theories to explain the phenomenon of interest and wants to build a theory to fill in the unmet gap in that area, interpretive designs such as case research or ethnography may be useful designs. If competing theories exist and the researcher wishes to test these different theories or integrate them into a larger theory, positivist designs such as experimental design, survey research, or secondary data analysis are more appropriate.

Regardless of the specific research design chosen, the researcher should strive to collect quantitative and qualitative data using a combination of techniques such as questionnaires, interviews, observations, documents, or secondary data. For instance, even in a highly structured survey questionnaire, intended to collect quantitative data, the researcher may leave some room for a few open-ended questions to collect qualitative data that may generate unexpected insights not otherwise available from structured quantitative data alone. Likewise, while case research employ mostly face-to-face interviews to collect most qualitative data, the potential and value of collecting quantitative data should not be ignored. As an example, in a study of organisational decision-making processes, the case interviewer can record numeric quantities such as how many months it took to make certain organisational decisions, how many people were involved in that decision process, and how many decision alternatives were considered, which can provide valuable insights not otherwise available from interviewees’ narrative responses. Irrespective of the specific research design employed, the goal of the researcher should be to collect as much and as diverse data as possible that can help generate the best possible insights about the phenomenon of interest.

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Enago Academy

Experimental Research Design — 6 mistakes you should never make!

' src=

Since school days’ students perform scientific experiments that provide results that define and prove the laws and theorems in science. These experiments are laid on a strong foundation of experimental research designs.

An experimental research design helps researchers execute their research objectives with more clarity and transparency.

In this article, we will not only discuss the key aspects of experimental research designs but also the issues to avoid and problems to resolve while designing your research study.

Table of Contents

What Is Experimental Research Design?

Experimental research design is a framework of protocols and procedures created to conduct experimental research with a scientific approach using two sets of variables. Herein, the first set of variables acts as a constant, used to measure the differences of the second set. The best example of experimental research methods is quantitative research .

Experimental research helps a researcher gather the necessary data for making better research decisions and determining the facts of a research study.

When Can a Researcher Conduct Experimental Research?

A researcher can conduct experimental research in the following situations —

  • When time is an important factor in establishing a relationship between the cause and effect.
  • When there is an invariable or never-changing behavior between the cause and effect.
  • Finally, when the researcher wishes to understand the importance of the cause and effect.

Importance of Experimental Research Design

To publish significant results, choosing a quality research design forms the foundation to build the research study. Moreover, effective research design helps establish quality decision-making procedures, structures the research to lead to easier data analysis, and addresses the main research question. Therefore, it is essential to cater undivided attention and time to create an experimental research design before beginning the practical experiment.

By creating a research design, a researcher is also giving oneself time to organize the research, set up relevant boundaries for the study, and increase the reliability of the results. Through all these efforts, one could also avoid inconclusive results. If any part of the research design is flawed, it will reflect on the quality of the results derived.

Types of Experimental Research Designs

Based on the methods used to collect data in experimental studies, the experimental research designs are of three primary types:

1. Pre-experimental Research Design

A research study could conduct pre-experimental research design when a group or many groups are under observation after implementing factors of cause and effect of the research. The pre-experimental design will help researchers understand whether further investigation is necessary for the groups under observation.

Pre-experimental research is of three types —

  • One-shot Case Study Research Design
  • One-group Pretest-posttest Research Design
  • Static-group Comparison

2. True Experimental Research Design

A true experimental research design relies on statistical analysis to prove or disprove a researcher’s hypothesis. It is one of the most accurate forms of research because it provides specific scientific evidence. Furthermore, out of all the types of experimental designs, only a true experimental design can establish a cause-effect relationship within a group. However, in a true experiment, a researcher must satisfy these three factors —

  • There is a control group that is not subjected to changes and an experimental group that will experience the changed variables
  • A variable that can be manipulated by the researcher
  • Random distribution of the variables

This type of experimental research is commonly observed in the physical sciences.

3. Quasi-experimental Research Design

The word “Quasi” means similarity. A quasi-experimental design is similar to a true experimental design. However, the difference between the two is the assignment of the control group. In this research design, an independent variable is manipulated, but the participants of a group are not randomly assigned. This type of research design is used in field settings where random assignment is either irrelevant or not required.

The classification of the research subjects, conditions, or groups determines the type of research design to be used.

experimental research design

Advantages of Experimental Research

Experimental research allows you to test your idea in a controlled environment before taking the research to clinical trials. Moreover, it provides the best method to test your theory because of the following advantages:

  • Researchers have firm control over variables to obtain results.
  • The subject does not impact the effectiveness of experimental research. Anyone can implement it for research purposes.
  • The results are specific.
  • Post results analysis, research findings from the same dataset can be repurposed for similar research ideas.
  • Researchers can identify the cause and effect of the hypothesis and further analyze this relationship to determine in-depth ideas.
  • Experimental research makes an ideal starting point. The collected data could be used as a foundation to build new research ideas for further studies.

6 Mistakes to Avoid While Designing Your Research

There is no order to this list, and any one of these issues can seriously compromise the quality of your research. You could refer to the list as a checklist of what to avoid while designing your research.

1. Invalid Theoretical Framework

Usually, researchers miss out on checking if their hypothesis is logical to be tested. If your research design does not have basic assumptions or postulates, then it is fundamentally flawed and you need to rework on your research framework.

2. Inadequate Literature Study

Without a comprehensive research literature review , it is difficult to identify and fill the knowledge and information gaps. Furthermore, you need to clearly state how your research will contribute to the research field, either by adding value to the pertinent literature or challenging previous findings and assumptions.

3. Insufficient or Incorrect Statistical Analysis

Statistical results are one of the most trusted scientific evidence. The ultimate goal of a research experiment is to gain valid and sustainable evidence. Therefore, incorrect statistical analysis could affect the quality of any quantitative research.

4. Undefined Research Problem

This is one of the most basic aspects of research design. The research problem statement must be clear and to do that, you must set the framework for the development of research questions that address the core problems.

5. Research Limitations

Every study has some type of limitations . You should anticipate and incorporate those limitations into your conclusion, as well as the basic research design. Include a statement in your manuscript about any perceived limitations, and how you considered them while designing your experiment and drawing the conclusion.

6. Ethical Implications

The most important yet less talked about topic is the ethical issue. Your research design must include ways to minimize any risk for your participants and also address the research problem or question at hand. If you cannot manage the ethical norms along with your research study, your research objectives and validity could be questioned.

Experimental Research Design Example

In an experimental design, a researcher gathers plant samples and then randomly assigns half the samples to photosynthesize in sunlight and the other half to be kept in a dark box without sunlight, while controlling all the other variables (nutrients, water, soil, etc.)

By comparing their outcomes in biochemical tests, the researcher can confirm that the changes in the plants were due to the sunlight and not the other variables.

Experimental research is often the final form of a study conducted in the research process which is considered to provide conclusive and specific results. But it is not meant for every research. It involves a lot of resources, time, and money and is not easy to conduct, unless a foundation of research is built. Yet it is widely used in research institutes and commercial industries, for its most conclusive results in the scientific approach.

Have you worked on research designs? How was your experience creating an experimental design? What difficulties did you face? Do write to us or comment below and share your insights on experimental research designs!

Frequently Asked Questions

Randomization is important in an experimental research because it ensures unbiased results of the experiment. It also measures the cause-effect relationship on a particular group of interest.

Experimental research design lay the foundation of a research and structures the research to establish quality decision making process.

There are 3 types of experimental research designs. These are pre-experimental research design, true experimental research design, and quasi experimental research design.

The difference between an experimental and a quasi-experimental design are: 1. The assignment of the control group in quasi experimental research is non-random, unlike true experimental design, which is randomly assigned. 2. Experimental research group always has a control group; on the other hand, it may not be always present in quasi experimental research.

Experimental research establishes a cause-effect relationship by testing a theory or hypothesis using experimental groups or control variables. In contrast, descriptive research describes a study or a topic by defining the variables under it and answering the questions related to the same.

' src=

good and valuable

Very very good

Good presentation.

Rate this article Cancel Reply

Your email address will not be published.

case study research design is a kind of experimental research

Enago Academy's Most Popular Articles

What is Academic Integrity and How to Uphold it [FREE CHECKLIST]

Ensuring Academic Integrity and Transparency in Academic Research: A comprehensive checklist for researchers

Academic integrity is the foundation upon which the credibility and value of scientific findings are…

7 Step Guide for Optimizing Impactful Research Process

  • Publishing Research
  • Reporting Research

How to Optimize Your Research Process: A step-by-step guide

For researchers across disciplines, the path to uncovering novel findings and insights is often filled…

Launch of "Sony Women in Technology Award with Nature"

  • Industry News
  • Trending Now

Breaking Barriers: Sony and Nature unveil “Women in Technology Award”

Sony Group Corporation and the prestigious scientific journal Nature have collaborated to launch the inaugural…

Guide to Adhere Good Research Practice (FREE CHECKLIST)

Achieving Research Excellence: Checklist for good research practices

Academia is built on the foundation of trustworthy and high-quality research, supported by the pillars…

ResearchSummary

  • Promoting Research

Plain Language Summary — Communicating your research to bridge the academic-lay gap

Science can be complex, but does that mean it should not be accessible to the…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

Research Recommendations – Guiding policy-makers for evidence-based decision making

case study research design is a kind of experimental research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

case study research design is a kind of experimental research

As a researcher, what do you consider most when choosing an image manipulation detector?

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

case study research design is a kind of experimental research

Home Market Research

Experimental Research: What it is + Types of designs

Experimental Research Design

Any research conducted under scientifically acceptable conditions uses experimental methods. The success of experimental studies hinges on researchers confirming the change of a variable is based solely on the manipulation of the constant variable. The research should establish a notable cause and effect.

What is Experimental Research?

Experimental research is a study conducted with a scientific approach using two sets of variables. The first set acts as a constant, which you use to measure the differences of the second set. Quantitative research methods , for example, are experimental.

If you don’t have enough data to support your decisions, you must first determine the facts. This research gathers the data necessary to help you make better decisions.

You can conduct experimental research in the following situations:

  • Time is a vital factor in establishing a relationship between cause and effect.
  • Invariable behavior between cause and effect.
  • You wish to understand the importance of cause and effect.

Experimental Research Design Types

The classic experimental design definition is: “The methods used to collect data in experimental studies.”

There are three primary types of experimental design:

  • Pre-experimental research design
  • True experimental research design
  • Quasi-experimental research design

The way you classify research subjects based on conditions or groups determines the type of research design  you should use.

0 1. Pre-Experimental Design

A group, or various groups, are kept under observation after implementing cause and effect factors. You’ll conduct this research to understand whether further investigation is necessary for these particular groups.

You can break down pre-experimental research further into three types:

  • One-shot Case Study Research Design
  • One-group Pretest-posttest Research Design
  • Static-group Comparison

0 2. True Experimental Design

It relies on statistical analysis to prove or disprove a hypothesis, making it the most accurate form of research. Of the types of experimental design, only true design can establish a cause-effect relationship within a group. In a true experiment, three factors need to be satisfied:

  • There is a Control Group, which won’t be subject to changes, and an Experimental Group, which will experience the changed variables.
  • A variable that can be manipulated by the researcher
  • Random distribution

This experimental research method commonly occurs in the physical sciences.

0 3. Quasi-Experimental Design

The word “Quasi” indicates similarity. A quasi-experimental design is similar to an experimental one, but it is not the same. The difference between the two is the assignment of a control group. In this research, an independent variable is manipulated, but the participants of a group are not randomly assigned. Quasi-research is used in field settings where random assignment is either irrelevant or not required.

Importance of Experimental Design

Experimental research is a powerful tool for understanding cause-and-effect relationships. It allows us to manipulate variables and observe the effects, which is crucial for understanding how different factors influence the outcome of a study.

But the importance of experimental research goes beyond that. It’s a critical method for many scientific and academic studies. It allows us to test theories, develop new products, and make groundbreaking discoveries.

For example, this research is essential for developing new drugs and medical treatments. Researchers can understand how a new drug works by manipulating dosage and administration variables and identifying potential side effects.

Similarly, experimental research is used in the field of psychology to test theories and understand human behavior. By manipulating variables such as stimuli, researchers can gain insights into how the brain works and identify new treatment options for mental health disorders.

It is also widely used in the field of education. It allows educators to test new teaching methods and identify what works best. By manipulating variables such as class size, teaching style, and curriculum, researchers can understand how students learn and identify new ways to improve educational outcomes.

In addition, experimental research is a powerful tool for businesses and organizations. By manipulating variables such as marketing strategies, product design, and customer service, companies can understand what works best and identify new opportunities for growth.

Advantages of Experimental Research

When talking about this research, we can think of human life. Babies do their own rudimentary experiments (such as putting objects in their mouths) to learn about the world around them, while older children and teens do experiments at school to learn more about science.

Ancient scientists used this research to prove that their hypotheses were correct. For example, Galileo Galilei and Antoine Lavoisier conducted various experiments to discover key concepts in physics and chemistry. The same is true of modern experts, who use this scientific method to see if new drugs are effective, discover treatments for diseases, and create new electronic devices (among others).

It’s vital to test new ideas or theories. Why put time, effort, and funding into something that may not work?

This research allows you to test your idea in a controlled environment before marketing. It also provides the best method to test your theory thanks to the following advantages:

Advantages of experimental research

  • Researchers have a stronger hold over variables to obtain desired results.
  • The subject or industry does not impact the effectiveness of experimental research. Any industry can implement it for research purposes.
  • The results are specific.
  • After analyzing the results, you can apply your findings to similar ideas or situations.
  • You can identify the cause and effect of a hypothesis. Researchers can further analyze this relationship to determine more in-depth ideas.
  • Experimental research makes an ideal starting point. The data you collect is a foundation for building more ideas and conducting more action research .

Whether you want to know how the public will react to a new product or if a certain food increases the chance of disease, experimental research is the best place to start. Begin your research by finding subjects using  QuestionPro Audience  and other tools today.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

When I think of “disconnected”, it is important that this is not just in relation to people analytics, Employee Experience or Customer Experience - it is also relevant to looking across them.

I Am Disconnected – Tuesday CX Thoughts

May 21, 2024

Customer success tools

20 Best Customer Success Tools of 2024

May 20, 2024

AI-Based Services in Market Research

AI-Based Services Buying Guide for Market Research (based on ESOMAR’s 20 Questions) 

data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Case Study | Definition, Examples & Methods

Case Study | Definition, Examples & Methods

Published on 5 May 2022 by Shona McCombes . Revised on 30 January 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating, and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyse the case.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Prevent plagiarism, run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

Unlike quantitative or experimental research, a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

If you find yourself aiming to simultaneously investigate and solve an issue, consider conducting action research . As its name suggests, action research conducts research and takes action at the same time, and is highly iterative and flexible. 

However, you can also choose a more common or representative case to exemplify a particular category, experience, or phenomenon.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data .

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis, with separate sections or chapters for the methods , results , and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyse its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2023, January 30). Case Study | Definition, Examples & Methods. Scribbr. Retrieved 21 May 2024, from https://www.scribbr.co.uk/research-methods/case-studies/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, correlational research | guide, design & examples, a quick guide to experimental design | 5 steps & examples, descriptive research design | definition, methods & examples.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Perspect Clin Res
  • v.9(4); Oct-Dec 2018

Study designs: Part 1 – An overview and classification

Priya ranganathan.

Department of Anaesthesiology, Tata Memorial Centre, Mumbai, Maharashtra, India

Rakesh Aggarwal

1 Department of Gastroenterology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

There are several types of research study designs, each with its inherent strengths and flaws. The study design used to answer a particular research question depends on the nature of the question and the availability of resources. In this article, which is the first part of a series on “study designs,” we provide an overview of research study designs and their classification. The subsequent articles will focus on individual designs.

INTRODUCTION

Research study design is a framework, or the set of methods and procedures used to collect and analyze data on variables specified in a particular research problem.

Research study designs are of many types, each with its advantages and limitations. The type of study design used to answer a particular research question is determined by the nature of question, the goal of research, and the availability of resources. Since the design of a study can affect the validity of its results, it is important to understand the different types of study designs and their strengths and limitations.

There are some terms that are used frequently while classifying study designs which are described in the following sections.

A variable represents a measurable attribute that varies across study units, for example, individual participants in a study, or at times even when measured in an individual person over time. Some examples of variables include age, sex, weight, height, health status, alive/dead, diseased/healthy, annual income, smoking yes/no, and treated/untreated.

Exposure (or intervention) and outcome variables

A large proportion of research studies assess the relationship between two variables. Here, the question is whether one variable is associated with or responsible for change in the value of the other variable. Exposure (or intervention) refers to the risk factor whose effect is being studied. It is also referred to as the independent or the predictor variable. The outcome (or predicted or dependent) variable develops as a consequence of the exposure (or intervention). Typically, the term “exposure” is used when the “causative” variable is naturally determined (as in observational studies – examples include age, sex, smoking, and educational status), and the term “intervention” is preferred where the researcher assigns some or all participants to receive a particular treatment for the purpose of the study (experimental studies – e.g., administration of a drug). If a drug had been started in some individuals but not in the others, before the study started, this counts as exposure, and not as intervention – since the drug was not started specifically for the study.

Observational versus interventional (or experimental) studies

Observational studies are those where the researcher is documenting a naturally occurring relationship between the exposure and the outcome that he/she is studying. The researcher does not do any active intervention in any individual, and the exposure has already been decided naturally or by some other factor. For example, looking at the incidence of lung cancer in smokers versus nonsmokers, or comparing the antenatal dietary habits of mothers with normal and low-birth babies. In these studies, the investigator did not play any role in determining the smoking or dietary habit in individuals.

For an exposure to determine the outcome, it must precede the latter. Any variable that occurs simultaneously with or following the outcome cannot be causative, and hence is not considered as an “exposure.”

Observational studies can be either descriptive (nonanalytical) or analytical (inferential) – this is discussed later in this article.

Interventional studies are experiments where the researcher actively performs an intervention in some or all members of a group of participants. This intervention could take many forms – for example, administration of a drug or vaccine, performance of a diagnostic or therapeutic procedure, and introduction of an educational tool. For example, a study could randomly assign persons to receive aspirin or placebo for a specific duration and assess the effect on the risk of developing cerebrovascular events.

Descriptive versus analytical studies

Descriptive (or nonanalytical) studies, as the name suggests, merely try to describe the data on one or more characteristics of a group of individuals. These do not try to answer questions or establish relationships between variables. Examples of descriptive studies include case reports, case series, and cross-sectional surveys (please note that cross-sectional surveys may be analytical studies as well – this will be discussed in the next article in this series). Examples of descriptive studies include a survey of dietary habits among pregnant women or a case series of patients with an unusual reaction to a drug.

Analytical studies attempt to test a hypothesis and establish causal relationships between variables. In these studies, the researcher assesses the effect of an exposure (or intervention) on an outcome. As described earlier, analytical studies can be observational (if the exposure is naturally determined) or interventional (if the researcher actively administers the intervention).

Directionality of study designs

Based on the direction of inquiry, study designs may be classified as forward-direction or backward-direction. In forward-direction studies, the researcher starts with determining the exposure to a risk factor and then assesses whether the outcome occurs at a future time point. This design is known as a cohort study. For example, a researcher can follow a group of smokers and a group of nonsmokers to determine the incidence of lung cancer in each. In backward-direction studies, the researcher begins by determining whether the outcome is present (cases vs. noncases [also called controls]) and then traces the presence of prior exposure to a risk factor. These are known as case–control studies. For example, a researcher identifies a group of normal-weight babies and a group of low-birth weight babies and then asks the mothers about their dietary habits during the index pregnancy.

Prospective versus retrospective study designs

The terms “prospective” and “retrospective” refer to the timing of the research in relation to the development of the outcome. In retrospective studies, the outcome of interest has already occurred (or not occurred – e.g., in controls) in each individual by the time s/he is enrolled, and the data are collected either from records or by asking participants to recall exposures. There is no follow-up of participants. By contrast, in prospective studies, the outcome (and sometimes even the exposure or intervention) has not occurred when the study starts and participants are followed up over a period of time to determine the occurrence of outcomes. Typically, most cohort studies are prospective studies (though there may be retrospective cohorts), whereas case–control studies are retrospective studies. An interventional study has to be, by definition, a prospective study since the investigator determines the exposure for each study participant and then follows them to observe outcomes.

The terms “prospective” versus “retrospective” studies can be confusing. Let us think of an investigator who starts a case–control study. To him/her, the process of enrolling cases and controls over a period of several months appears prospective. Hence, the use of these terms is best avoided. Or, at the very least, one must be clear that the terms relate to work flow for each individual study participant, and not to the study as a whole.

Classification of study designs

Figure 1 depicts a simple classification of research study designs. The Centre for Evidence-based Medicine has put forward a useful three-point algorithm which can help determine the design of a research study from its methods section:[ 1 ]

An external file that holds a picture, illustration, etc.
Object name is PCR-9-184-g001.jpg

Classification of research study designs

  • Does the study describe the characteristics of a sample or does it attempt to analyze (or draw inferences about) the relationship between two variables? – If no, then it is a descriptive study, and if yes, it is an analytical (inferential) study
  • If analytical, did the investigator determine the exposure? – If no, it is an observational study, and if yes, it is an experimental study
  • If observational, when was the outcome determined? – at the start of the study (case–control study), at the end of a period of follow-up (cohort study), or simultaneously (cross sectional).

In the next few pieces in the series, we will discuss various study designs in greater detail.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Privacy Policy

Research Method

Home » Quasi-Experimental Research Design – Types, Methods

Quasi-Experimental Research Design – Types, Methods

Table of Contents

Quasi-Experimental Design

Quasi-Experimental Design

Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable(s) that is available in a true experimental design.

In a quasi-experimental design, the researcher uses an existing group of participants that is not randomly assigned to the experimental and control groups. Instead, the groups are selected based on pre-existing characteristics or conditions, such as age, gender, or the presence of a certain medical condition.

Types of Quasi-Experimental Design

There are several types of quasi-experimental designs that researchers use to study causal relationships between variables. Here are some of the most common types:

Non-Equivalent Control Group Design

This design involves selecting two groups of participants that are similar in every way except for the independent variable(s) that the researcher is testing. One group receives the treatment or intervention being studied, while the other group does not. The two groups are then compared to see if there are any significant differences in the outcomes.

Interrupted Time-Series Design

This design involves collecting data on the dependent variable(s) over a period of time, both before and after an intervention or event. The researcher can then determine whether there was a significant change in the dependent variable(s) following the intervention or event.

Pretest-Posttest Design

This design involves measuring the dependent variable(s) before and after an intervention or event, but without a control group. This design can be useful for determining whether the intervention or event had an effect, but it does not allow for control over other factors that may have influenced the outcomes.

Regression Discontinuity Design

This design involves selecting participants based on a specific cutoff point on a continuous variable, such as a test score. Participants on either side of the cutoff point are then compared to determine whether the intervention or event had an effect.

Natural Experiments

This design involves studying the effects of an intervention or event that occurs naturally, without the researcher’s intervention. For example, a researcher might study the effects of a new law or policy that affects certain groups of people. This design is useful when true experiments are not feasible or ethical.

Data Analysis Methods

Here are some data analysis methods that are commonly used in quasi-experimental designs:

Descriptive Statistics

This method involves summarizing the data collected during a study using measures such as mean, median, mode, range, and standard deviation. Descriptive statistics can help researchers identify trends or patterns in the data, and can also be useful for identifying outliers or anomalies.

Inferential Statistics

This method involves using statistical tests to determine whether the results of a study are statistically significant. Inferential statistics can help researchers make generalizations about a population based on the sample data collected during the study. Common statistical tests used in quasi-experimental designs include t-tests, ANOVA, and regression analysis.

Propensity Score Matching

This method is used to reduce bias in quasi-experimental designs by matching participants in the intervention group with participants in the control group who have similar characteristics. This can help to reduce the impact of confounding variables that may affect the study’s results.

Difference-in-differences Analysis

This method is used to compare the difference in outcomes between two groups over time. Researchers can use this method to determine whether a particular intervention has had an impact on the target population over time.

Interrupted Time Series Analysis

This method is used to examine the impact of an intervention or treatment over time by comparing data collected before and after the intervention or treatment. This method can help researchers determine whether an intervention had a significant impact on the target population.

Regression Discontinuity Analysis

This method is used to compare the outcomes of participants who fall on either side of a predetermined cutoff point. This method can help researchers determine whether an intervention had a significant impact on the target population.

Steps in Quasi-Experimental Design

Here are the general steps involved in conducting a quasi-experimental design:

  • Identify the research question: Determine the research question and the variables that will be investigated.
  • Choose the design: Choose the appropriate quasi-experimental design to address the research question. Examples include the pretest-posttest design, non-equivalent control group design, regression discontinuity design, and interrupted time series design.
  • Select the participants: Select the participants who will be included in the study. Participants should be selected based on specific criteria relevant to the research question.
  • Measure the variables: Measure the variables that are relevant to the research question. This may involve using surveys, questionnaires, tests, or other measures.
  • Implement the intervention or treatment: Implement the intervention or treatment to the participants in the intervention group. This may involve training, education, counseling, or other interventions.
  • Collect data: Collect data on the dependent variable(s) before and after the intervention. Data collection may also include collecting data on other variables that may impact the dependent variable(s).
  • Analyze the data: Analyze the data collected to determine whether the intervention had a significant impact on the dependent variable(s).
  • Draw conclusions: Draw conclusions about the relationship between the independent and dependent variables. If the results suggest a causal relationship, then appropriate recommendations may be made based on the findings.

Quasi-Experimental Design Examples

Here are some examples of real-time quasi-experimental designs:

  • Evaluating the impact of a new teaching method: In this study, a group of students are taught using a new teaching method, while another group is taught using the traditional method. The test scores of both groups are compared before and after the intervention to determine whether the new teaching method had a significant impact on student performance.
  • Assessing the effectiveness of a public health campaign: In this study, a public health campaign is launched to promote healthy eating habits among a targeted population. The behavior of the population is compared before and after the campaign to determine whether the intervention had a significant impact on the target behavior.
  • Examining the impact of a new medication: In this study, a group of patients is given a new medication, while another group is given a placebo. The outcomes of both groups are compared to determine whether the new medication had a significant impact on the targeted health condition.
  • Evaluating the effectiveness of a job training program : In this study, a group of unemployed individuals is enrolled in a job training program, while another group is not enrolled in any program. The employment rates of both groups are compared before and after the intervention to determine whether the training program had a significant impact on the employment rates of the participants.
  • Assessing the impact of a new policy : In this study, a new policy is implemented in a particular area, while another area does not have the new policy. The outcomes of both areas are compared before and after the intervention to determine whether the new policy had a significant impact on the targeted behavior or outcome.

Applications of Quasi-Experimental Design

Here are some applications of quasi-experimental design:

  • Educational research: Quasi-experimental designs are used to evaluate the effectiveness of educational interventions, such as new teaching methods, technology-based learning, or educational policies.
  • Health research: Quasi-experimental designs are used to evaluate the effectiveness of health interventions, such as new medications, public health campaigns, or health policies.
  • Social science research: Quasi-experimental designs are used to investigate the impact of social interventions, such as job training programs, welfare policies, or criminal justice programs.
  • Business research: Quasi-experimental designs are used to evaluate the impact of business interventions, such as marketing campaigns, new products, or pricing strategies.
  • Environmental research: Quasi-experimental designs are used to evaluate the impact of environmental interventions, such as conservation programs, pollution control policies, or renewable energy initiatives.

When to use Quasi-Experimental Design

Here are some situations where quasi-experimental designs may be appropriate:

  • When the research question involves investigating the effectiveness of an intervention, policy, or program : In situations where it is not feasible or ethical to randomly assign participants to intervention and control groups, quasi-experimental designs can be used to evaluate the impact of the intervention on the targeted outcome.
  • When the sample size is small: In situations where the sample size is small, it may be difficult to randomly assign participants to intervention and control groups. Quasi-experimental designs can be used to investigate the impact of an intervention without requiring a large sample size.
  • When the research question involves investigating a naturally occurring event : In some situations, researchers may be interested in investigating the impact of a naturally occurring event, such as a natural disaster or a major policy change. Quasi-experimental designs can be used to evaluate the impact of the event on the targeted outcome.
  • When the research question involves investigating a long-term intervention: In situations where the intervention or program is long-term, it may be difficult to randomly assign participants to intervention and control groups for the entire duration of the intervention. Quasi-experimental designs can be used to evaluate the impact of the intervention over time.
  • When the research question involves investigating the impact of a variable that cannot be manipulated : In some situations, it may not be possible or ethical to manipulate a variable of interest. Quasi-experimental designs can be used to investigate the relationship between the variable and the targeted outcome.

Purpose of Quasi-Experimental Design

The purpose of quasi-experimental design is to investigate the causal relationship between two or more variables when it is not feasible or ethical to conduct a randomized controlled trial (RCT). Quasi-experimental designs attempt to emulate the randomized control trial by mimicking the control group and the intervention group as much as possible.

The key purpose of quasi-experimental design is to evaluate the impact of an intervention, policy, or program on a targeted outcome while controlling for potential confounding factors that may affect the outcome. Quasi-experimental designs aim to answer questions such as: Did the intervention cause the change in the outcome? Would the outcome have changed without the intervention? And was the intervention effective in achieving its intended goals?

Quasi-experimental designs are useful in situations where randomized controlled trials are not feasible or ethical. They provide researchers with an alternative method to evaluate the effectiveness of interventions, policies, and programs in real-life settings. Quasi-experimental designs can also help inform policy and practice by providing valuable insights into the causal relationships between variables.

Overall, the purpose of quasi-experimental design is to provide a rigorous method for evaluating the impact of interventions, policies, and programs while controlling for potential confounding factors that may affect the outcome.

Advantages of Quasi-Experimental Design

Quasi-experimental designs have several advantages over other research designs, such as:

  • Greater external validity : Quasi-experimental designs are more likely to have greater external validity than laboratory experiments because they are conducted in naturalistic settings. This means that the results are more likely to generalize to real-world situations.
  • Ethical considerations: Quasi-experimental designs often involve naturally occurring events, such as natural disasters or policy changes. This means that researchers do not need to manipulate variables, which can raise ethical concerns.
  • More practical: Quasi-experimental designs are often more practical than experimental designs because they are less expensive and easier to conduct. They can also be used to evaluate programs or policies that have already been implemented, which can save time and resources.
  • No random assignment: Quasi-experimental designs do not require random assignment, which can be difficult or impossible in some cases, such as when studying the effects of a natural disaster. This means that researchers can still make causal inferences, although they must use statistical techniques to control for potential confounding variables.
  • Greater generalizability : Quasi-experimental designs are often more generalizable than experimental designs because they include a wider range of participants and conditions. This can make the results more applicable to different populations and settings.

Limitations of Quasi-Experimental Design

There are several limitations associated with quasi-experimental designs, which include:

  • Lack of Randomization: Quasi-experimental designs do not involve randomization of participants into groups, which means that the groups being studied may differ in important ways that could affect the outcome of the study. This can lead to problems with internal validity and limit the ability to make causal inferences.
  • Selection Bias: Quasi-experimental designs may suffer from selection bias because participants are not randomly assigned to groups. Participants may self-select into groups or be assigned based on pre-existing characteristics, which may introduce bias into the study.
  • History and Maturation: Quasi-experimental designs are susceptible to history and maturation effects, where the passage of time or other events may influence the outcome of the study.
  • Lack of Control: Quasi-experimental designs may lack control over extraneous variables that could influence the outcome of the study. This can limit the ability to draw causal inferences from the study.
  • Limited Generalizability: Quasi-experimental designs may have limited generalizability because the results may only apply to the specific population and context being studied.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Case Study Research

Case Study – Methods, Examples and Guide

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Comparative study of typical neural solvers in solving math word problems

  • Original Article
  • Open access
  • Published: 22 May 2024

Cite this article

You have full access to this open access article

case study research design is a kind of experimental research

  • Bin He   ORCID: orcid.org/0000-0003-2088-8193 1 ,
  • Xinguo Yu 1 ,
  • Litian Huang 1 ,
  • Hao Meng 1 ,
  • Guanghua Liang 1 &
  • Shengnan Chen 1  

In recent years, there has been a significant increase in the design of neural network models for solving math word problems (MWPs). These neural solvers have been designed with various architectures and evaluated on diverse datasets, posing challenges in fair and effective performance evaluation. This paper presents a comparative study of representative neural solvers, aiming to elucidate their technical features and performance variations in solving different types of MWPs. Firstly, an in-depth technical analysis is conducted from the initial deep neural solver DNS to the state-of-the-art GPT-4. To enhance the technical analysis, a unified framework is introduced, which comprises highly reusable modules decoupled from existing MWP solvers. Subsequently, a testbed is established to conveniently reproduce existing solvers and develop new solvers by combing these reusable modules, and finely regrouped datasets are provided to facilitate the comparative evaluation of the designed solvers. Then, comprehensive testing is conducted and detailed results for eight representative MWP solvers on five finely regrouped datasets are reported. The comparative analysis yields several key findings: (1) Pre-trained language model-based solvers demonstrate significant accuracy advantages across nearly all datasets, although they suffer from limitations in math equation calculation. (2) Models integrated with tree decoders exhibit strong performance in generating complex math equations. (3) Identifying and appropriately representing implicit knowledge hidden in problem texts is crucial for improving the accuracy of math equation generation. Finally, the paper also discusses the major technical challenges and potential research directions in this field. The insights gained from this analysis offer valuable guidance for future research, model development, and performance optimization in the field of math word problem solving.

Avoid common mistakes on your manuscript.

Introduction

Math Word Problem (MWP) solving has been a long-standing research problem in the field of artificial intelligence [ 1 ]. However, previous methods required hand-crafted features, making them less effective for general problem-solving. In a milestone contribution, Wang et al. [ 2 ] designed the first deep learning-based algorithm, DNS, to solve MWPs, eliminating the need for hand-crafted features. Since then, multiple neural solvers with various network cells and architectures have emerged [ 3 , 4 , 5 , 6 , 7 , 8 , 9 ], with pioneering experiments conducted on diverse datasets with varying sizes and characteristics [ 1 , 10 ]. However, the experimental results show that even MWP solvers built with similar architectures exhibit varying performance on datasets with different characteristics. Hence, a precise and impartial analysis of the existing MWP solvers has become essential to reveal the potential factors of network cells and architectures that affect the performance of neural solvers in solving different characteristics of MWPs.

Earlier MWP solvers leveraged manually designed rules or semantic parsing to map problem text into math equations, followed by an equation solver to obtain the final answer. These early efforts could only solve a limited number of problems defined in advance. Inspired by deep learning models for natural language processing [ 11 , 12 ], recent neural solvers use an Encoder-Decoder framework [ 13 ] to transform a sequence of problem sentences into another sequence of arithmetic expressions or equations. The Encoder captures the information presented by the problem text, which can be divided into two categories: sequence-based representation learning [ 5 , 14 , 15 ] and graph-based representation learning [ 6 , 16 , 17 ]. Sequence-based representation learning processes the problem text as a sequence of tokens using recurrent neural networks [ 18 , 19 ] or transformers [ 11 ], while graph-based representation learning constructs a graph from the problem text. Graph neural networks (e.g., graph transformer model [ 20 ], inductive graph learning model [ 21 ]) are then used to learn a representation for the entire graph. Mathematical expressions can be viewed as sequences of symbols or modeled as trees based on their syntactic structure, allowing Decoders to predict output expressions based on the encoding vectors produced by the encoder. By combining different types of encoders and decoders, diverse architectures of MWP solvers have been developed, including Seq2Seq-based solvers, Seq2Tree-based solvers, and Graph2Tree-based solvers.

Several reviews and surveys have been conducted to examine the progress of research in this field. For example, Mukherjee et al. [ 22 ] made a first attempt to analyze mathematical problems solving systems and approaches according to different disciplines. Zhang et al. [ 1 ] classified and analyzed different representation learning methods according to technical characteristics. Meadows et al. [ 23 ] and Lu et al. [ 24 ] conducted a literature review on the recent deep learning-based models for solving math word problems. Lan et al. [ 10 ] established a unified algorithm test platform and conducted comparative experiments on typical neural solvers. While these reviews provide valuable insights into the field of automatic math word problem solving, little comparative evaluation has been carried out to reveal the performance variations of neural solvers with different architectures in solving various types of MWPs. An initial attempt can be found in [ 10 ] which provides a collection of experimental results of the typical neural solvers on several datasets. However, no other attempts to explore the performance variations of neural solvers with different architectures in solving different types of math word problems.

While significant efforts have been made, there remains a lack of comprehensive technical analysis to compare different network structures and their impacts on final performance. This paper presents a comparative study of typical neural solvers to unveil their technical features and performance variations in solving MWPs with diverse characteristics. We initially identify the architectures of typical neural solvers, rigorously analyzing the framework of each category, notably: Seq2Seq [ 2 , 4 ], Seq2Tree [ 5 , 25 ], Graph2Tree [ 6 , 17 ] and PLM-based models [ 26 , 27 , 28 , 29 , 30 ]. We propose a four-dimensional indicator to categorize the considered datasets for precise evaluation of neural solvers’ performance in solving various characteristics of MWPs. Typical neural solvers are disassembled into highly reusable components, enabling researchers to reconstruct them and develop new solvers by replacing components with proposed ones, which benefits both model evaluation and extension. To assess the considered solvers, we establish a testbed and conduct comprehensive experiments on five popular datasets using eight representative MWP solvers, followed by a comparative analysis of the results achieved. The contributions of our work can be summarized as follows:

We provide a comprehensive and systematic analysis of deep learning-based MWP solvers, ranging from the initial deep neural solver DNS to the latest GPT-4. This is achieved through an in-depth technical analysis of network structures and neural cell types, enabling a deeper understanding of the technological evolution of MWP solvers for the research community.

To enhance the technical analysis, we introduce a unified framework consisting of reusable encoding and decoding modules decoupled from existing MWP solvers. This framework allows for the straightforward reproduction and extension of typical MWP solvers by combining these reusable modules.

We establish a testbed and provide finely regrouped datasets to facilitate objective and fair evaluations of MWP solvers. Through this testbed, we conduct comprehensive testing and report detailed results for eight representative MWP solvers on five finely regrouped datasets, specifically highlighting the performance variations of solvers with different modules in solving different types of MWPs.

We present three key findings from our experiments and discuss the major technical challenges and potential research directions in this field.

The rest of the paper is organized as follows: Sect. “ Related work ” describes related work on math word problem solving. Section “ Architecture and technical feature analysis of neural solvers ” provides a detailed analysis of the framework of typical neural solvers. A characteristic analysis of the considered datasets is presented in Sect. “ Characteristics analysis of benchmark datasets ”, and experiments and a comparative analysis are conducted in Sect. “ Experiment ”. We conclude this paper in Sect. “ Conclusion ”.

Related work

In this section, we will explore various deep learning-based approaches for solving math word problems. We will also provide an introduction to previous surveys in this field.

Deep learning-based approaches for solving MWPs

Solving MWPs has been a longstanding research focus in the field of artificial intelligence since the 1960s, as illustrated in Fig.  1 . The evolution of MWP solvers can be categorized into different stages based on the underlying technologies utilized, including rule-based approaches [ 31 ], semantic parsing-based approaches [ 16 , 32 , 33 , 34 ], etc.. More recently, neural networks inspired by deep learning models for natural language processing [ 11 , 12 ] have been designed to tackle MWPs. For instance, the Deep Neural Solver (DNS) [ 2 ] is the first deep learning algorithm capable of translating problem texts to equations without relying on manually-crafted features. This advantage has motivated extensive research on neural solvers using larger datasets, as evidenced by several studies in the literature.

figure 1

Approach evolution in solving MWPs

A significant challenge in these studies is efficiently capturing the logical relationships between natural language texts and their corresponding equations [ 1 ] which is known as problem text representation and equation representation. Inspired by translation models [ 19 ], MWP solver is typically designed as an Encoder-Decoder framework [ 1 ] as shown in Table 1 . The Encoder is responsible for learning the semantic representation and logic relationships presented explicitly or implicitly of the problem text. Researchers have tried different sequence models, leading to several representative models such as DNS [ 2 ], MathEN [ 4 ]. The Decoder, usually designed as a sequence or tree structural model, treats the math equation as a symbolic sequence consisting of numbers and operators for decoding. Several tree-structured models, such as Tree-Dec [ 25 ], GTS [ 6 ], were designed and then widely accepted for math equation decoding to enhance the math equation generation. Recently, encoder-only pre-trained models like BERT [ 35 ] and GPT [ 28 , 29 ], were included in MWP solvers to effectively represent background knowledge. In the subsequent sections, we will provide a comprehensive review from these three perspectives.

Problem text representation

To avoid sophisticated feature engineering, deep learning technologies were applied for problem text representation. In this field, Wang et al. [ 2 ] have made significant contributions by designing a customized model called Deep Neural Solver (DNS) to automatically solve MWPs. Within the DNS, the problem text and mathematical expressions are represented as sequential data, making them amenable to processing by sequence models commonly used in Natural Language Processing (NLP). Consequently, the task of solving mathematical problems is modeled as a “translation" problem within a Sequence-to-Sequence (Seq2Seq) framework. Following this pioneering work, a number of Seq2Seq models [ 4 , 5 , 13 , 36 , 37 ] for MWPs have been developed. These Seq2Seq models treat the problem text as a sequence of word tokens and utilize Recurrent Neural Networks (RNNs) such as Long-Short Term Memory (LSTM) network [ 3 ], Gated Recurrent Unit (GRU) [ 47 ], and Transformer [ 11 ] for encoding the word sequence.

To enhance the representation of the problem text, numerous optimization strategies and auxiliary techniques have been proposed. For instance, Wang et al. [ 4 ] utilized different deep neural networks for problem encoding and achieved higher accuracy compared to other Seq2Seq models. Shen et al. [ 33 ] employed a multi-head attention mechanism to capture both local and global features of the problem text. Li et al. [ 37 ] developed a group attention mechanism to extract diverse features pertaining to quantities and questions in MWPs. These efforts aim to better capture the contextual information in the problem text, thereby improving the efficiency of expression generation.

In addition to capturing the contextual information from the problem text, researchers have explored graph-based models inspired by the success of previous works [ 20 , 21 ] to capture non-sequential information, such as quantity unit relations, numerical magnitude relations, and syntactic dependency relations. These non-sequential relations are considered helpful in ensuring the logical correctness of expression generations. For instance, quantity unit relationships can help reduce illegal operations between values with different units, and numerical magnitude relationships can help reduce the occurrence of negative results from subtracting a larger number from a smaller number. Based on these assumptions, Zhang et al. propose Graph2Tree [ 6 ], which constructs a quantity cell graph and a quantity comparison graph to represent quantity unit relationships and numerical magnitude relationships, respectively. Similarly, Li et al. [ 17 ] introduce the constituency tree augmented text graph, which incorporates a constructed graph into a graph neural network [ 48 , 49 ] for encoding. The output of these graph models, combined with the output of the sequence model, is used for decoding. Additionally, knowledge-aware models [ 7 , 50 , 51 ] have been designed to improve problem representation.

Recently, Pre-trained Language Models (PLMs), and especially transformer-based language models, have shown to contain commonsense and factual knowledge [ 52 , 53 ]. To enhance the representation of problem texts, PLMs were employed for problem text encoding, aiming to reason through outside knowledge provided by the PLMs. Yu et al. [ 40 ] utilized RoBERTa [ 54 ] to capture implicit knowledge representations in input problem texts. Li et al. [ 41 ] leveraged BERT [ 35 , 55 ] for both understanding semantic patterns and representing linguistic knowledge. Liang et al. [ 26 ] employed BERT and RoBERTa for contextual number representation. These models have yielded significant improvement in terms of answer accuracy. Recently, decode-only PLMs, such GPT [ 28 ], PaLM [ 44 , 45 ] and LLaMA [ 46 ], exhibit strong reasoning abilities and their potential in solving MWPs, especially integrated with technologies of prompt [ 56 ] and chain-of-thought [ 57 ]. For instance, the latest release, GPT4-CSV [ 30 ], achieved an almost 20% increase in answer accuracy on the MATH dataset compared to GPT3.5 [ 28 ]. However, despite these improvements, issues such as actual errors and reasoning errors [ 58 ] by LLMs may lead to wrong answers even with carefully crafted prompt sequences.

Math equation representation

The representation of math equations presents another challenge in the design of MWP solvers. Initially, math equations were commonly modeled as sequences of symbols and operators, known as equation templates [ 2 ]. This allowed for direct processing by sequence models such as LSTM, GRU, etc. However, these sequence models suffer non-deterministic transduction [ 1 , 4 ] as a math word problem can have multiple correct equations. To address this issue, approaches such as MathEN [ 4 ] was proposed to normalize the duplicated equations to ensure that each problem text corresponds to a unique math equation. Chiang et al. [ 13 ] took it further by utilizing the Universal Expression Tree (UET) to represent math equations. However, these methods encode math equations using sequence models, ignoring the hierarchical structure of logical forms within math equations.

To capture the structural information, researchers have proposed tree-structured models (TreeDecoders) [ 5 , 17 , 25 ] for the iterative construction of equation trees. Liu et al. [ 25 ] developed a top-down hierarchical tree-structured decoder (Tree-Dec) inspired by Dong et al. [ 59 ]. The Tree-Dec [ 25 ] enhances a basic sequence-based LSTM decoder by incorporating tree-based information as input. This information consists of three components: parent feeding, sibling feeding, and previous token feeding, which are then processed by a global attention network. Xie et al. [ 5 ] introduced a goal-driven mechanism (GTS) for feeding tree-based information. Li et al. [ 17 ] applied a separate attention mechanism to the node representations corresponding to different node types. Additionally, Zhang et al. [ 27 ] proposed a multi-view reasoning approach that combines the top-down decomposition of TreeDecoder with the bottom-up construction of reductive reasoning [ 9 ]. Due to its exceptional ability in math equations generation, TreeDecoders has been widely adopted by subsequent MWP solvers [ 7 , 38 , 39 , 43 , 60 ]. Furthermore, several extensions of TreeDecoders have been explored, such as the generation of diverse and interpretable solutions [ 7 , 38 , 39 , 60 ].

The previous survey work

Despite the extensive research conducted in the field of MWP solving, there is a lack of comprehensive reviews. Mukherjee et al. [ 22 ] conducted a functional review of various natural language mathematical problem solvers, starting from early systems like STUDENT [ 61 ] to lately developed ROBUST [ 62 ]. The paper provides a systematic review of representative systems in domains such as math problems, physics problems, chemistry problems, and theorem proving. It highlights that these systems are generally useful for typical cases but have limitations in understanding and representing problems of diverse nature [ 22 ]. Additionally, there is a lack of unified benchmark datasets and clear evaluation strategies. However, since the publication date of the paper is early, it does not cover the current mainstream neural network-based methods, which limits its comprehensive assessment of the research field.

With the rise of machine learning-based MWP solving, Zhang et al. [ 1 ] conducted a review of these emerging works from the perspective of representation of problem texts and mathematical expressions. The paper categorizes the development of machine-answering techniques into three stages: rule-based matching, statistical learning and semantic parsing, and deep learning. The authors argue that the primary challenge in machine answering is the existence of a significant semantic gap between human-readable words and machine-understandable logic. They focus on reviewing tree-based methods [ 16 , 63 , 64 , 65 ] and deep learning-based methods [ 32 , 66 , 67 , 68 , 69 ]. The paper also reports the test results of these methods on certain datasets, aiming to provide readers with insights into the technical characteristics and classification of machine answering in the era of machine learning.

In recent literature, Meadows et al. [ 23 ] and Lu et al. [ 24 ] conducted comprehensive surveys on the emerging deep learning-based models developed for solving math word problems. These studies systematically classify and document the network architectures and training techniques utilized by these models. Furthermore, they provide a detailed analysis of the challenges faced in this field as well as the trends observed in the development of such models. Lan et al. [ 10 ] developed MWPToolkit, a unified framework and re-implementation of typical neural solvers [ 2 , 4 , 5 , 6 , 13 , 19 , 33 , 36 , 37 , 38 ]. MWPToolkit provides specified interfaces for running existing models and developing new models. However, there is a lack of technical analysis on the network structures of these neural solvers. Recently, pilot work has been conducted to compare the performance of MWP solvers based on deep learning models. Chen et al. [ 70 ] performed a comparative analysis of six representative MWP solvers to reveal their solving performance differences. Building upon this prior work, He et al. [ 71 ] further investigated the performance comparison of representation learning models in several considered MWP solvers.

This paper conducts an in-depth and comprehensive comparative analysis to reveal the technical features and performance variations of typical neural solvers when solving MWPs with different characteristics. The goal is to assist researchers in selecting more effective network units and structures for tasks with different features.

Architecture and technical feature analysis of neural solvers

The general architecture of neural solvers.

Math word problem solving is a mixed process of reasoning and calculating that can hardly be solved directly by neural networks that are designed for classification or regression tasks. Hence, most of the neural solvers take a two-step solution of expression generation and answer calculation. The former aims to translate the input problem text into a calculable math expression and then be followed by a mathematical solver to calculate the final answer. Therefore, the key challenge of solving a math word problem is to generate the target math expression.

Earlier solvers, such as DNS [ 2 ], tackle this challenge by using a seq2seq model in which math expressions are abstracted into expression templates and each template is treated as a sequence with operators and symbols. Later, to improve the capability of new expression generation, math expressions are modeled as decomposable tree structures instead of fixed structures of sequences. A milestone work of tree-structured decomposing is the Graph2Tree model proposed by Xie et al. [ 5 ] and this model is widely used in the newly developed neural solvers. Under this Graph2Tree model, the math expression generation is further divided into three sub-steps, including problem modeling, problem encoding and expression decoding as shown in Fig.  2 .

figure 2

The general architecture of a neural solver for solving math word problems

Generally, a neural solver can be summarized as an Encoder-Decoder architecture of

where the problem P is consisted by a word token sequence \(V=(v_1, v_2,...,v_n)\) and each \(w_i\) denotes the token of word \(w_i\) . \(F_{encoding}(.)\) and \(F_{decoding}(.)\) are networks to obtain the problem text representation and generate math equations accordingly. The goal of building a neural solver is to train an encoding network \(F_{encoding}(.)\) for problem feather representation learning, and a decoding network \(F_{decoding}(.)\) for predicting math expressions \(ME=(e_1,e_2,...,e_m)\) to achieve the final answer. We give a detailed analysis of the architecture of mainstream encoders and decoders below separately.

Problem modeling. Problem modeling defines the pipeline of neural networks. Specifically, it models the data structure of the input and output of the solvers. For input, the problem texts are usually modeled as word sequences followed by a recursive neural network for feature learning. A huge improved work has been made which converts sequential texts into graphs, hence graph neural networks can be used for feature learning.

The output of the solvers is the target math expression which can be modeled as specially designed sequences composed of operators and number tokens. An expression vocabulary is defined which contains operators (e.g., \(+,-,\times , \div \) ), constant quantities (e.g., \( 1, \pi \) ) and numbers presented by the problem text. Based on the built vocabulary, a math expression can be abstracted as an expression template in which digits are replaced by number tokens of \(n_i\) . In recent works, target expressions are represented as expression trees. A basic expression tree contains three nodes of the root, left child and right child. The child node can be a digit or an operator that owns at most two children. By employing this tree-structured decomposing, nearly all types of expressions, even those that did not exist in the training set, can also be constructed.

Problem encoding. Problem encoding is a representation learning module to learn the features from the input problem text. According to the representation learning methods applied, problem encoding can be divided into sequence-based methods and graph-based methods.

Expression decoding. Expression decoding is to train a decoding network to convert features obtained in problem encoding into expression templates. As discussed in Problem Modeling , the expression templates can be number token sequences or trees. Hence, expression decoding can be accordingly divided into sequence-based decoding methods and tree-structured decoding methods.

Answer calculation. A number mapping operation is implemented in the stage of answer calculation after expression templates are obtained by replacing the number tokens \(n_i\) back to digits, followed by a mathematical solver to calculate the final answer.

Currently, neural solvers are designed as an Encoder-Decoder framework to accomplish the tasks of problem encoding and expression decoding. The early encoder-decoder model refers to Seq2Seq [ 2 ], that is, the Encoder takes the input problem text as a sequence, and the output expression predicted by the Decoder is also a sequence [ 65 ]. Later, researchers pointed out that the output expression can be better described as a tree structure, e.g. expression tree [ 72 ], equation tree [ 25 ], so the Seq2Tree model was proposed. The GTS, a typical Seq2Tree-based model, was proposed by Xie et al. [ 5 ], in which the output expressions are transformed as pre-order trees and a goal-driven decomposition method is proposed to generate the expression tree based on the input sequence Furthermore, several works revealed that a math word problem is not only a sequence, but also contains structured information about numeric quantities. To represent the quantity relationships, the graph structure is applied to model the quantities as nodes and relations as edges. By combining with tree-structural decoders, several Graph2Tree-based models are proposed [ 6 , 33 ] recently.

According to the network components applied in problem encoding ( Encoder ) and expression decoding ( Decoder ), neural network-based MWP solvers can be divided into four major categories: Seq2Seq, Seq2Tree, Graph2Tree and PLM-based model as shown in Table 2 .

Seq2Seq is a sequence-to-sequence framework, where both the Encoder and Decoder are sequence-based networks. The Encoder takes the sequence of word tokens as input and outputs the feature vectors, usually an embedding vector and a hidden state vector. The feature vectors are sent to the Decoder to predict the expression templates. The embedding vector is usually used to predict the current character of operators or number tokens and the hidden state vector records the contextual features of the current character. LSTM [ 3 ] and GRU [ 47 ] are two commonly used networks in building Encoders and Decoders [ 2 , 5 , 37 , 38 , 65 ]. For example, MathEN [ 65 ] leverages two LSTM networks as Encoder and Decoder , while DNS [ 2 ] employs an LSTM network and a GRU network as Encoder and Decoder separately.

Seq2Tree is an improved framework based on the Seq2Seq architecture in which the sequence-based Decoder is replaced by a tree-structured network to generate expression trees. As discussed above, the tree-structured network is a compound of prediction networks and feature networks, as well as a decision mechanism. For instance, in GTS [ 5 ], a prediction network and two feature networks are employed to merge the previous state vectors and to calculate the current state vector. In another work [ 17 ], only one feather network is used to accomplish the task of feature merging and calculation.

Graph2Tree combines the advantages of a graph-based encoder and a tree-based decoder in the process of problem encoding and expression decoding. Compared to Seq2Tree, Graph2Tree applies graphs to represent the structural relations among word tokens and digits into a network structure (e.g., graph) to enhance the feature learning during the problem encoding. Various kinds of algorithms have been proposed to construct graphs [ 6 , 7 , 17 , 39 ] by modeling the structural relations on both word token level and sentence level.

PLM-based models leverage pre-trained language models to generate intermediate MWP representation and solution. Depending on the type of PLM [ 58 ], there are two specific implementations of PLM-based models. The first implementation, represented by encoder-only PLMs like BERT [ 26 , 27 ], utilizes the PLM as an encoder to obtain the latent representation of the math word problem. This representation is then fed into a decoder, such as a Tree-based decoder, to generate the final mathematical expression. The second implementation, represented by models like GPT [ 28 , 29 , 30 ], directly employs Transformer networks for mathematical reasoning, producing the desired results without an explicit separation between encoding and decoding stages. This approach streamlines the process and enhances the efficiency of solving math word problems.

As shown in Table  2 , DNS and MahtEN are Seq2Seq models, while GTS is built as a seq2tree structure. The tree-structured decoder designed in GTS is also applied in Graph2Tree \(^1\) . Graph2Tree \(^1\) and Graph2Tree \(^2\) are two graph2tree models but differ in both graph encoding and tree decoding. In the stage of graph encoding, Graph2Tree \(^1\) uses Quantity Cell Graph and Quantity Comparison Graph to describe the quantity relationships, while Graph2Tree \(^2\) leverages Syntactic Graph to present the word dependency and the phrase structure information. In the decoding stage, a pre-order expression tree is generated in Graph2Tree \(^1\) , while Graph2Tree \(^2\) employs a hierarchical expression tree to model the output expression.

  • Problem text encoding

In recent years, a trend in building MWP solvers [ 1 ] is to apply deep neural networks to capture the quantity relationships presented by problem texts explicitly and implicitly. The early MWP solvers [ 2 , 65 ] mainly use sequence-based models, such as LSTM [ 3 ], GRU [ 47 ], etc., to conduct problem representation learning, in which the problem text is regarded as an unstructured sequence. Recently, graph-based representation learning methods [ 5 , 6 , 17 ] are widely employed to enhance both structured and unstructured information learning, which attracts more and more attention of community researchers. On the other hand, several benchmark datasets with diverse characteristics were released for performance evaluation of the proposed solvers [ 10 ]. To reveal the potential effectiveness of presentation learning methods on diverse characteristics of MWPs, a comparative analysis of sequence-based and graph-based representation learning is conducted in this paper.

Sequence-based problem encoding

As a problem is mainly presented by natural language text, the sequence-based recursive neural network (RNN) models [ 3 , 47 ] are naturally taken to problem representation learning. For example, DNS [ 2 ] uses a typical Seq2Seq model for problem representation learning, where words are split into tokens inputted into a GRU module to capture quantity relations. Several follow-up works were proposed by replacing GRU with BiLSTM or BiGRU to enhance the ability of quantity relation learning [ 7 , 14 , 15 ]. To improve the semantic embedding, pre-trained language models, such as GloVe [ 17 ], BERT [ 26 ], Chinese BERT [ 55 ] and GPT [ 28 , 73 ], etc., were used to better understand the input problem texts. Besides, to capture more features between problem sentences and the goal, attention modules are employed in several works to extract local and global information. For instance, Li et al. [ 37 ] introduced a group attention that contains different attention mechanisms which achieved substantially better accuracy than baseline methods.

In a sequence-based representation learning model, every word of the problem text P is first transformed into the context representation. Given an input problem text \(P=\{ w_{1},w_{2},...,w_{n} \}\) , each word token \(w_{i}\) is vectorized into the word embedding \(w_{i}\) through word embedding techniques such as GloVe [ 17 ], BERT[ 26 ], etc. To capture the word dependency and learn the representation of each token, the sequence of word embeddings is input into the RNN whose cells can be LSTM [ 3 ], GRU [ 47 ], etc. Formally, each word embedding \(w_{i}\) of the sequence \(E=\{ w_{1},w_{2},...,w_{n} \}\) is input into the RNN one by one, and a sequence of hidden states is produced as the output.

For unidirectional encoding, the procedure of problem representation learning can be described as follows:

where \({RNN}(\cdot ,\cdot )\) denotes a recursive neural network, \(h_{i-1}^p\) denotes the previous hidden state and \(w_{i}\) denotes the current input. Repeat the above calculation from step 1 to n to obtain the final hidden state \(h_{n}\) , which is the result of the sequence-based representation learning. In practice, \({RNN}(\cdot ,\cdot )\) is usually specified as a two-layer LSTM or GRU network.

For bi-direction encoding, BiLSTM or BiGRU is applied to obtain the left vector \(\overrightarrow{h_i^p}\) and the right vector \(\overleftarrow{h_i^p}\) separately. Finally, the output hidden state \(h_i^p\) is calculated as follows:

To capture different types of features in hidden state \(h_s^p\) , attention mechanisms are employed to enhance the related features. For example, Li et at. [ 37 ] applied a multi-head attention network following a BiLSTM network. The output of the group attention \(h_a^p\) is produced by:

where Q , K and V denote the query matrix, key matrix and value matrix separately, which are all initialized as \(h_i^p\) .

The above process can be replaced by employing a pre-trained language model. As shown in Eq.  5 , a pre-trained language model PLM (.) is used to directly map the problem text, denoted as X, to a representation matrix H.

Graph-based problem encoding

To improve structural information learning, graph-based encoders were applied to represent relationships among numbers, words and sentences, etc. The structural information includes token-level information and sentence-level information. The former is also considered as local information which is constructed from the number comparison relationship (e.g., bigger, smaller), neighborhood relationship between numbers and the associated word tokens, etc. For example (as shown in Fig.  3 a), Zhang et al. [ 6 ] applied two graphs, including a quantity comparison graph and a quantity cell graph to enrich the information between related quantities. The sentence-level information, in a sense, is the global information that connects local token-level information. A commonly used sentence-level information is the syntactic structure information generated from dependency parsing. As shown in Fig.  3 b, to capture the sentence structure information, the dependency parsing and the constituency analysis [ 17 ] were applied to construct graphs. Furthermore, Wu et al. [ 50 ] proposed a mixed graph, called an edge-labeled graph, to establish the relationship between nodes at both the sentence level and problem level. Once the problem text is represented as a graph, graph networks such as GraphSAGE [ 21 ], GCN [ 74 ], can be used to learn the node embedding. One of the advantages of using graph representation learning is that external knowledge can be easily imported into the graph to improve the accuracy of problem solving [ 50 ].

figure 3

Comparison of graph-based quantity relation representation. a Quantity comparison graph and quantity cell graph designed by Zhang et al. [ 6 ]; b Constituency tree augmented text graph applied by Li et al. [ 17 ]

Different from the sequence-based representation learning methods, the graph-based representation learning methods take important structural information into consideration when encoding. Due to the fact that different researchers construct the graph using different methods, unifying these methods is more complex than unifying the sequence-based representation learning methods. Through the summary and induction of several typical works[ 6 , 7 , 16 , 17 ], we divide the procedure of sequence-based representation learning into three steps: node initialization, graph construction and graph encoding.

Graph Construction. The graph construction is a pre-process before graph encoding, which converts the problem P into a graph \(G=(V, E)\) aiming at preserving more structural information hidden in P . To this end, elements such as words and quantities are treated as nodes V , and syntactic relationships such as grammatical dependency and phrase structure are modeled as edges E .

To enrich the information during graph construction, several adjacency modeling approaches are proposed to construct graphs according to the relationships of words and numbers in P . For example, in reference [ 16 ], a Unit Dependency Graph (UDG) is constructed to represent the relationship between the numbers and the question being asked. In work [ 6 ], two graphs, including a quantity comparison graph and a quantity cell graph, are built to model the relationships between the descriptive words associated with a quantity. Syntactic constituency information is used to construct the quantity graph in [ 17 ]. Through the graph construction process, a set of graph \(\mathbb {G} =\{ G_1,G_2,...,G_K \}\) is obtained from problem P for graph encoding.

Graph encoding. After initializing the node and constructing the graph, the graph neural network is applied to obtain the output vector. The procedure can be summarized as follows:

where \(G\!N\!N(\cdot ,\cdot )\) denotes a graph neural network, such as GCN [ 74 ] or GraphSAGE [ 21 ]. The pair \((E_k,V_k)\) represents the \(k_{th}\) graph \(G_k\) in \(\mathbb {G}\) , with \(V_k\) as the node set and \(E_k\) as the edge set. Both \(V_k\) and \(E_k\) are formed during the node initialization stage. \(h_k^g\) denotes the hidden state corresponding to the input graph \(G_k\) . When more than one graph ( \(k>1\) ) is utilized, the output values \({h_k^g}_{k=1}^K\) are concatenated and projected to produce the final value H . Finally, the global graph representation \(h^g\) can be obtained:

where \(FC(\cdot )\) is a fully connected network and \(Pooling(\cdot )\) denotes pooling function.

  • Math expression decoding

To achieve the final answer, vectors after problem representation learning are decoded as mathematical expressions followed by a math solver to calculate the answer. Early neural solvers, such as DNS [ 2 ], employ a typical Seq2Seq model to predict mathematical expressions. Later, to improve the generation ability of new expressions, tree-based models [ 5 ] are proposed to capture the structure information hidden in expressions.

Expression Decoder decodes the feature vectors obtained by the problem Encoder into expression templates. The decoding process is a step-by-step prediction of number tokens and operators. Therefore, recursive neural networks are naturally chosen for this task. The decoding process can be described as a conditional probability function as follows:

where x denotes vectors of input problems, \(y_t\) and \(h_t\) is the predicted character and decoder hidden state at step t separately, and \(F_{prediction}\) is a non-linear function. The key component of Eq. ( 8 ) is the computation of \(h_t\) to ensure the output expressions are mathematically correct. Hence, the default activation functions of the general RNNs need to be redesigned. According to the redesigned activation functions, expression decoding can be divided into two main categories: sequence-based decoding and tree-based decoding.

Sequence model based expression decoding

In sequence-based models, expressions are usually abstracted as a sequence of equation templates with number tokens and operators [ 2 , 37 , 65 ]. For example, the expression \(x = 5+2*3\) is described as an equation template \(x=n_1+n_3+n_2\) , \(n_i\) is the token of the i th number in problem P . In the stage of math expression generation, a decoder is designed to predict an equation template for each input problem and then expressions are generated by mapping the numbers in the input problem to the number tokens in the predicted equation template [ 2 ]. Hence, the math expression generation is transformed into a sequence prediction task and one of the core tasks of math expression generation is to design a decoder to predict the equation templates. Typical sequence models built for NLP tasks can be directly applied for building such decoders [ 47 , 72 ]. Compared to retrieval models [ 32 , 75 ], sequence-based models achieved significant improvement in solving problems requiring new equations that not existed in the training set. However, these models are usually sensitive to the length of the expressions as they generate solution expressions sequentially from left to right.

In sequence-based expression decoding, the activation function is defined according to the basic rules of arithmetic operations. For example in infix expressions, if \(y_{t-1}\) is a number, then \(y_t\) should be a non-number character. Therefore, the redesigned activation function differs according to the infix and suffix expressions used.

In infix sequence models [ 2 ], predefined rules are used to decide the type of the \(t_{th}\) character according to the \((t-1)_{th}\) character. For example, rule “If \(y_{t-1}\) in \(\{+,-,\times , \div \}\) , then \(y_t\) will not in \(\{+,-,\times , \div , ),= \}\) ” defines the following character after an operator is predicted. Similar rules are used to determine characters after “(, ), =” and numbers are predicted.

In suffix sequence models [ 36 , 37 ], two numbers will be first accessed by the RNN to determine the operator and generate a new quantity as the parent node. The representation of the parent node \(o_c\) can be calculated by a probability function like:

where \(h_l, h_r\) are the quantity representations for the previously predicted nodes, and \(W_1, W_2\) and b are trainable parameters.

Tree-structured model based expression decoding

To describe the structural relation among operators and digits, expression templates are represented as tree structures and tree-structured networks are proposed to learn the structural features of the expression trees. Compared to left-to-right sequential representation in sequence-based methods, relationships among operators and numbers are represented by tree structures, such as expression tree [ 72 ] or equation tree [ 63 ]. Strictly, the tree-structured network is not a novel network architecture but a compound of networks and a decision mechanism. For example, the previous state when predicting a left child node is the parent node state, but in a right child node prediction, both the parent node state and the left child node state are considered as the previous state [ 5 ]. Hence, a decision mechanism is designed to choose different embedding states when predicting a left and right child node. Besides, various neural cells (e.g., a prediction network and a feature network) are usually employed for current character prediction and current hidden state calculation [ 6 , 17 ].

Therefore, the tree-structured networks are decided by the structure of the expression trees and the tree-based decoding is a decomposing process of an expression tree. According to the decomposing strategy employed, tree-based decoding can be divided into two main categories: depth-first decomposing [ 5 , 6 ] and breadth-first decomposing [ 17 ].

Depth-first decomposing. As shown in Fig.  4 b, the depth-first decomposing starts from the root node and implements a pre-order operation during the prediction. As such, if an operator is predicted, then go to predict the left child until a number node is predicted, then go to predict the right child. To make full of available information, the prediction of the right child takes the information of its left sibling node and the parent information into consideration. Roy et al. [ 72 ] proposed the first approach that leverages expression trees to represent expressions. Xie et al. [ 5 ] proposed a goal-driven tree-structured neural network, which was adopted by a set of latter methods [ 6 , 14 , 15 ], to generate an expression tree.

figure 4

An example of tree-structured decomposing. a Input expression template; b Depth-first decomposing; c Breadth-first decomposing

Breadth-first decomposing. In breadth-first decomposing models, expressions are represented as hierarchically connected coarse equations. A coarse equation is an algebraic expression that contains both numbers and unknown variables. Compared to depth-first decomposing, an essential difference of breadth-first decomposing is that the non-leaf nodes are specified as variables. Therefore, the variable nodes are decomposable nodes that can be replaced by sub-trees. As shown in Fig.  4 (c), an example equation is firstly represented as a 1st-level coarse equation \(s_1 \div n_3(2)=x\) containing a non-leaf node \(s_1\) and four leaf nodes. Then, the non-leaf node \(s_1\) is decomposed into a sub-tree as the 2nd-level coarse equation of \(n_1(19)-n_2(11)\) . When all coarse equations are achieved then go to predict the 3rd-level coarse equations if it has, otherwise, the decomposing stops.

To start a tree generation process, the root node vector \(q_{root}\) is initialized according to the global problem representation. For each token y in the target word \(V^{dec}\) , the representation for a certain token \(\textrm{e}(y \mid P)\) , as denoted as \(h_t\) in Eq. ( 8 ), is defined as follows:

where \(\textrm{e}_{(y, op)}\) , \(\textrm{e}_{(y,u)}\) and \(\textrm{e}_{(y, con)}\) denotes the representation of operators, unknowns and quantities separately that is obtained from 3 independent embedding matrices \(M_{op}\) , \(M_{unk}\) and \(M_{con}\) . \(\bar{h}_{loc(y, P)}^{p}\) is the quantity representation from Eqs. ( 3 ) or ( 4 ). \(V^{dec}\) is the target vocabulary which consists of 4 parts: math operators \(V_{op}\) , unknowns \(V_u\) , constants \(V_{con}\) and the numbers \(n_p\) .

In order to adapt to the tree-structured expression generation, activation functions are redesigned according to the types of nodes in the expression tree. The nodes are categorized into two types: leaf nodes and non-leaf nodes. When a non-leaf node is predicted, further decomposing is needed to predict the child nodes. Otherwise, stop the current decomposing and go to predict the right child nodes. The non-leaf node differs in different representations of tree-structured expressions. In regular expression trees [ 5 , 6 ], the non-leaf nodes are operators while numbers are treated as leaf nodes. While in a heterogeneous expression tree, the non-leaf nodes are non-target variables that are represented by sub-expressions.

Based on the above discussion, the whole procedure of tree-based expression decoding can be summarized as follows [ 5 , 6 , 7 , 14 ]:

1) Tree initialization: Initialize the root tree node with the global embedding \(H_g\) and perform the first level decoding:

where the global embedding \(H_g\) is the original output of the problem Encoder .

2) Left sub-node generation: A sub-decoder is applied to derive the left sub-node. The new left child \(n_l\) is conditioned on the parent node \(n_p\) and the global embedding \(H_g\) . The token \(\hat{y}_l\) is predicted when generating the new left node:

If the generated \(\hat{y}_l \in V_{op}\) or \(\hat{y}_l \in V_{u}\) , repeat step 2). If the generated \(\hat{y}_l \in V_{con}\) or \(\hat{y}_l \in n_p\) , get into step 3).

3) Right-node generation: Different from the left sub-node generation, the right sub-node is conditioned on the left sub-node \(n_l\) , the global embedding \(H_g\) and a sub-tree embedding \(t_l\) . The right sub-node \(n_r\) and the corresponding token \(\hat{y}_r\) can be obtained as:

where the sub-tree embedding \(t_l\) is conditioned on the left sub-node token \(\hat{y}_l\) and left sub-node \(n_l\) . If the \(\hat{y}_r \in V_{op}\) or \(\hat{y}_r \in V_{u}\) , repeat step 2). If the generated \(\hat{y}_r \in V_{con}\) or \(\hat{y}_r \in n_p\) , stop decomposing and backtrack to find a new empty right sub-node position. If no new empty right nodes can be found, the generation is completed. If the empty right node position still exists, go back to step 2).

In other models [ 17 ], step 2) and 3) are combined into a sub-tree generation module in which the token embedding \(s_t\) and the corresponding token \(\hat{y}_t\) at time t are calculated as follows:

where \(st_{parent}\) stands for sub-tree node embedding from the parent layer and \(st_{sibling}\) is the sentence embedding of the sibling.

Compared to earlier sequence-based decoders which are usually retrieved models, tree-based decoders are generative models that can generate new expressions not existing in the training set. The generation ability lies in the iterative process of tree-structured decomposing as defined in Eq.  10 and the equation accuracy was greatly improved by using tree-based decoders. Detailed results can be found in Sect. “ Experiment ”.

Characteristics analysis of benchmark datasets

Widely used benchmark datasets.

Problem texts and equations are two essential items for neural solver evaluation. The problem text of each example in the dataset is a natural language stated short text that presents a fact and raises a question and the equation is a math expression(s) (e.g., an arithmetic expression, an equation or equations) that can be used to generate the final answer to the question raised by the problem text. A problem text can be stated in any language but most of the widely used datasets are stated in English [ 32 , 76 , 77 ] until Wang et al. [ 2 ] released a Chinese dataset Math23K in 2017 which contains 23,161 problems with carefully labeled equations and answers. A brief introduction of the widely accepted benchmark datasets is given as follows and the result of a statistical analysis conducted on the considered datasets is shown in Table 3 .

Alg514 is a multiple-equation dataset created by Kushman et al. [ 32 ]. It contains 514 algebra word problems from Algebra.com. In the dataset, each template corresponds to at least 6 problems (T6 setting). It only contains 28 templates in total.

Draw1K is a multiple-equation dataset created by Upadhyay et al. [ 78 ]. It contains 1000 algebra word problems also crawled and filtered from Algebra.com.

Dolphin18K is a multiple-equation dataset created by Huang et al.[ 77 ]. It contains 18,711 math word problems from Yahoo! Answers with 5,738 templates. It has much more and harder problem types than the previous datasets.

MAWPS-s is a single-equation dataset created by Koncel-Kedziorski et al. [ 76 ]. It contains 3320 arithmetic problems of different complexity compiled from different websites.

SVAMP is a single-equation dataset created by Patel et al. [ 79 ]. It contains 1000 problems with grade levels up to 4. Each problem consists of one-unknown arithmetic word problems which can be solved by expressions requiring no more than two operators.

Math23K is a single-equation dataset created by Wang et al. [ 2 ]. It contains 23, 162 Chinese math word problems crawled from the Internet. Each problem is labeled with an arithmetic expression and an answer.

HMWP is a multiple-equation dataset created by Qin et al. [ 38 ]. It contains 5491 Chinese math word problems extracted from a Chinese K12 math word problem bank.

Despite the available large-scale datasets, neural solver evaluation is still a lot trickier for the various types and characteristics of math word problems. As almost all neural solvers predict equation templates directly from the input problem text, the complexity and characteristics of the equations and the input problem texts need further study to make the evaluated results more elaborate.

Characteristics analysis

To evaluate the neural solvers, three widely used benchmark datasets include two English datasets MAWPS-s and SWAMP , and a Chinese dataset Math23k . All the selected datasets are single-equation problems as almost all solvers support the single-equation generation task. Conversely, not all solvers support the multi-equation generation task which may lead to poor comparability.

As discussed in Sect. “ Characteristics analysis of benchmark datasets ”, problem texts and expressions differ greatly in terms of scope and difficulty between different datasets. In order to reveal the performance difference of neural solvers on datasets with different characteristics, the selected benchmark datasets are categorized into several sub-sets based on four-index characteristic factors of L , H , C and S defined as follows:

Expression Length ( L ): denotes the length complexity of the output expression. L can be used as an indicator of the expression generation capability of a neural solver. According to the number of operators involved in the output expression, L is defined as a three-level indicator containing \(L_1\) , \(L_2\) and \(L_3\) . \(L_1\) level: \(l<T_0\) , \(L_2\) level: \(T_0<=l<=T_1\) , \(L_3\) level: others. Where l represents the number of operators in the output expression. \(T_0\) , \(T_1\) denote the thresholds of l at different levels of length complexity.

Expression Tree Depth ( H ): denotes the height complexity of the output expression tree. H is another generation capability indicator, especially for tree-structured neural solvers. According to the depth of the expression tree, H is defined as a two-level indicator containing \(H_1\) and \(H_2\) . \(H_2\) level: \(h < T_2\) , \(H_3\) level: others. Where h refers to the height of the expression tree, \(T_2\) is a threshold.

Implicit Condition ( C ): denotes whether implicit expressions needed to solve the problem are embedded in the problem text. \(C_{1}\) refers to problems with no implicit expression, while \(C_{2}\) refers to problems with one or more implicit expressions. C can be used as an indicator associated with the relevant information understanding of the solver.

Arithmetic Situation ( S ): denotes the situation type that a problem belongs to. The different arithmetic situation indicates different series of arithmetic operations. Based on Mayer’s work, we divide math word problems into five typical types which are Motion ( \(S_m\) ), Proportion ( \(S_p\) ), Unitary ( \(S_u\) ), InterestRate ( \(S_{ir}\) ), and Summation ( \(S_s\) ). S can be used as an indicator associated with the context understanding of the solver.

Each of the selected benchmark datasets is divided into three sub-sets of train-set (80%), valid-set (10%) and test-set (10%). These sub-sets are further characterized according to the above four indices. Tables  4 and 5 show the percentage of problems of different benchmark datasets on the four indices for training and testing separately. Compared to Math23K, expressions in MAWPS-s and SVAMP are much more simple on both factors of the expression length and expression depth. Hence, we set different thresholds for \(T_i\) to generate \(L_*\) and \(H_*\) subsets. Moreover, implicit expression and problem situation analysis are only implemented on Math23K dataset.

Experimental setup

Selected typical neural solvers: To ensure the fairness of the performance evaluation, two representative solvers were selected from each framework as shown in Table  2 . The selected solvers are listed below:

DNS [ 2 ]: The first Seq2Seq model using a deep neural network to solve math word problems. The model combines the RNN model and the similarity-based retrieval model. If the maximum similarity score returned based on the retrieval model is higher than the specified threshold, the retrieval model is then selected. Otherwise, the Seq2Seq model is selected to solve the problem.

MathEN [ 4 ]: The ensemble model that combines three Seq2Seq models uses the equation normalization method, which normalizes repeated equation templates into expression trees.

GTS [ 5 ]: A tree-structured neural model based on the Seq2Tree framework to generate expression trees in a goal-driven manner.

SAU-Solver [ 38 ]: A semantically aligned universal tree structure solver based on the Seq2Tree framework, and it generates a universal expression tree explicitly by deciding which symbol to generate according to the generated symbols’ semantics.

Graph2Tree [ 6 , 17 ]: Graph2Tree \(^1\) and Graph2Tree \(^2\) are both deep learning architectures based on the Graph2tree framework, combining the advantages of a graph-based encoder and a tree-based decoder. However, the two differ in graph encoding and tree decoding.

Bert2Tree [ 26 ]: An MWP-specific large language model with 8 pre-training objectives designed to solve the number representation issue in MWP.

GPT-4 [ 30 ]: A decoder-only large language model released by OpenAI in March, 2023.

The above selected typical solvers are evaluated in solving characteristic problems on five benchmark datasets and the detailed results can be found in Sect. “ Performance on solving characteristic problems ”.

Component Decoupling According to the discussion in Sect. “ Architecture and technical feature analysis of neural solvers ”, each solver consists of an encoder which can be decomposed into one or more basic RNN or GNN cells. To identify the contribution of these various cells during the problem solving, we decouple the above considered solvers into individual components. The decoupled components can be integrated into different solvers and can be replaced by other similar components. Components decoupled from encoders are listed as follows.

LSTM Cell : A long-short term memory network derived from sequence-based encoders for non-structural problem text encoding.

GRU Cell : A gated recurrent unit derived from sequence-based encoders for non-structural problem text encoding.

BERT Cell : A pre-trained language model used to directly map the problem text into a representation matrix for generating the solution.

GCN Cell : A graph convolution network derived from graph-based encoders for structural problem text encoding.

biGraphSAGE Cell : A bidirectional graph node embedding module derived from graph-based encoders for structural problem text encoding.

The LSTM cell and GRU cell take text sequence as input and output two text vectors including an embedding vector and a hidden state vector. The GCN cell and biGraphSAGE cell take the adjacency matrix as input and output two graph vectors. Similarly, components decoupled from decoders are listed below.

DT Cell : A depth-first decomposing tree method derived from Graph2Tree \(^1\) for math equation decoding. DT cell takes an embedding vector and a hidden vector as input and output a math equation.

BT Cell : A breadth-first decomposing tree method derived from Graph2Tree \(^2\) for math equation decoding. The BT cell takes three vectors as input, including one embedding vector and two hidden state vectors.

Hence, a super solver is developed to reproduce the selected typical solvers and design new solvers by redefining the combination of the decoupled components. The performance of newly developed solvers are shown and discussed in Sects. “ Comparative analysis of math expression decoding models ” and “ Comparative analysis of problem encoding models ” separately.

Evaluation Metrics Math word problems used for neural solver evaluation are usually composed of problem texts, equations and answers. Neural solvers take the problem texts as input and output the expression templates which are further mapped to calculable equations [ 1 ]. These generated equations are then compared with the equations labeled in datasets for algorithm performance evaluation. Besides this equation-based evaluation, answer-based evaluation is also used in cases where multiple solutions exist. The answer-based evaluation compares the answers calculated from the generated equations with labeled answers. Several commonly used evaluation metrics are introduced below, including accuracy ( \(E_{acc}\) and \(A_{acc}\) ), time cost (# Time ) and minimum GPU memory capacity (# \(\mathop {G\!\!-\!\!Mem}\) ).

Accuracy. Accuracy includes answer accuracy and equation accuracy. Answer accuracy [ 2 , 5 , 7 ] is perhaps the most common evaluation method. It simply involves calculating the percentage of final answers produced by the model that is correct. This is a good measure of the model’s overall performance, but it can be misleading if the dataset is unbalanced (e.g., if there are more easy problems than difficult ones). Equation accuracy [ 6 , 17 ] is another important measure, which refers to the accuracy of the solution that the model generates. This is typically calculated by comparing the output of the model to the correct solution to the problem, and determining whether they match. Evaluating both the solution accuracy and answer accuracy can give a more complete picture of the model’s performance on MWP solving tasks.

The Equation Accuracy ( \(E_{acc}\) ) which is computed by measuring the exact match of predicted equations and ground-truth equations as follows:

Similarly, the Answer Accuracy ( \(A_{acc}\) ) is defined as follows:

To remove extraneous parenthesis during equation matching, equations are transformed into equation trees as described in [ 17 ]. By using \(E_{acc}\) , outputs with correct answers but incorrect equations are treated as unsolved cases.

Time Cost (# Time ): denotes the time required for model training. Specifically, this article refers to the time needed for the model to complete 80 iterations with a batch size of 64.

Minimum GPU Memory Capacity (# \(\mathop {G\!\!-\!\!Mem}\) ): represents the minimum GPU memory capacity required for training the model. This metric is crucial for assessing the hardware requirements of model training, particularly for researchers with limited resources.

Hyper-parameters To improve the comparability of the experimental results, the hyper-parameters of the selected solvers and the decoupled cells are consistent with the original models. For example, the default LSTM and GRU cells are initialized as a two-layer network with 512 hidden units to accommodate the pre-trained word vector which usually has a dimension size of 300. In the biGraphSAGE cell, we set the maximum number of node hops K as \(K = 3\) and the pooling aggregator is employed. As to the optimizer, we use Adam with an initial learning rate of 0.001, and the learning rate will be halved every 20 epochs. We set the number of epochs to 80, batch size to 64, and dropout rate to 0.5. At last, we use a beam search with beam size 5 in both the sequence-based cells and tree-based cells. To alleviate the impact of the randomness of the neural network models, we conduct each experiment 5 times with different random seeds and report the average results. All these hyper-parameters have been carefully selected to balance computational efficiency with model performance.

Experimental Environment Our models run on a server with Intel i7 CPU. The GPU card is one NVIDIA GeForceRTX 3090. Codes are implemented in Python and PyTorch 1.4.0 is used for matrix operation. We use stanfordcorenlp to perform dependency parsing and token generation for Chinese datasets.

Performance comparison

Overall performance of considered solvers.

In this section, we initially present the learning curves of all considered models (excluding GPT-4) on two representative datasets (Math23k for Chinese and MAWPS-s for English).

It is evident that overfitting occurs on the MAWPS-s dataset, as shown in Fig.  5 . After 10 iterations, the models exhibit oscillations in terms of accuracy despite having low loss values. This suggests that overfitting has occurred in this case. The limited size of the MAWPS-s dataset, which contains only around 1500 training examples, is likely insufficient for effective training of most deep neural networks. On the other hand, the situation improves on the Math23K dataset. After approximately 30 iterations, both the loss and accuracy stabilize.

We have also examined the training time required for different models. As shown in Table  6 , without considering GPT-4 and fine-tuning of BERT, all models have completed training with a batch size of 64 within 4 min (# Time ), and have reached convergence in less than 80 iterations. Similarly, we have reported the minimum GPU memory capacity (# \(\mathop {G\!\!-\!\!Mem}\) ) required for training. This has been highly attractive for individual researchers as it has allowed them to quickly train the desired models locally without incurring high costs. The next step will be to evaluate the solving performance of different models.

figure 5

Learning curves on different datasets. a Learning curves on MAWPS-s; b Learning curves on Math23K

We provide an overall result of the considered solvers in terms of both single-equation and multi-equation tasks. We evaluate the \(E_{acc}\) and \(A_{acc}\) separately. Additionally, we report the average training time (#Time(minutes per epoch)) on Math23K.The detailed results are shown in Table  6 .

As shown in Table  6 , PLM-based models exhibit superior performance in terms of \(A_{acc}\) compared to other models. Specifically, without any additional prompts, GPT-4 achieves the best results on the MAWPS-s, SVAMP, and Draw1k datasets, with accuracies of 94.0%, 86.0%, and 42.1% respectively. On the other hand, BertTree performs well on the two Chinese datasets, Math23k and HMWP, with accuracies of 84.2% and 48.3% respectively. This demonstrates the significant advantage of PLM-based models, especially large-scale language modes such as GPT-4, in solving math word problems.

However, it is important to note that there is still room for improvement in the performance of all models, particularly in solving more complex math problems such as those in the Math23k, Draw1K, and HMWP datasets. There is still a considerable gap between the current performance levels and practical requirements. Additionally, traditional lightweight models also have their merits. For instance, models utilizing Tree-based decoders achieve leading performance in terms of \(E_{acc}\) , with 68.0%, 72.4%, 39.8%, and 39.6% on the SVAMP, Math23K, Draw1K, and HMWP datasets respectively. This highlights the potential advantages of Tree-based decoders in representing mathematical expressions. Furthermore, the resource requirements and response efficiency of large language models like GPT are also important considerations.

Among the lightweight models, Graph2Tree models demonstrate the best results on most selected datasets, particularly for multi-equation tasks on Draw1K and HMWP. This underscores the immense potential of the graph-to-tree framework in solving math word problems. However, we observed that Graph2Tree \(^2\) did not perform as well as Graph2Tree \(^1\) , underscoring the significance of careful cell selection in both problem encoding and expression decoding steps. Detailed analysis can be found in Sects. “ Comparative analysis of math expression decoding models ” and “ Comparative analysis of problem encoding models ”. Surprisingly, MathEN achieved the best performance on MAWPS-s and also outperformed other solvers in terms of \(E_{acc}\) on the Math23K dataset.

Based on the training time required per epoch on Math23K, we found that more complex models resulted in higher computational times, which is consistent with our general understanding. Among them, SAU-Solver and Graph2Tree \(^2\) had the longest training times, ranging from 3 to 4 min, while the DNS model, which only involves sequence encoding and decoding, had the shortest training time. Graph2Tree \(^1\) and GTS reported similar time costs, indicating that the added CNN unit in Graph2Tree \(^1\) has a minor impact on the computation cost of the graph-to-tree framework.

Performance on solving characteristic problems

In the following comparison, we only considered single-equation tasks for performance evaluation. This is because multi-equation tasks can be easily converted into a single-equation task by adding a special token \(<bridge>\) to convert the equations into a single tree or equation [ 10 ], without requiring any model modifications. Therefore, the performance of solvers on single-equation tasks is also useful for evaluating the performance of models on multi-equation tasks.

(1) Performance on solving problems indicated by expression length.

Table  7 presents the comparison results of \(E_{acc}\) in solving problems with varying equation lengths. Mean Accuracy Difference (MAD), denoted as \(d_{i-k}\) , is used to indicate the accuracy difference in solving problems from level \(L_{i}\) to \(L_{k}\) . Due to the difficulty of obtaining annotated standard arithmetic expressions for GPT models, we utilize the \(A_{acc}\) as a reference instead.

As depicted in Table  7 , PLM-based models have demonstrated superior performance compared to other models. Bert2Tree, in particular, has exhibited greater stability when compared to GPT-4. Specifically, its accuracy remains relatively consistent in both \(L_1\) and \(L_2\) level problems, with only a slight decrease of 17.2% in the \(L_3\) task. In contrast, GPT-4 experiences a significant decrease of 54.9% from \(L_1\) to \(L_3\) .

Graph2Tree models performed the best on SVAMP, achieving an average \(E_{acc}\) of 49.0% and 68.0%, respectively. Graph2Tree \(^1\) proved better at solving long equation problems. For example, on Math23k, it achieved state-of-the-art performance of 73.8% and 48.0% on solving \(L_2\) and \(L_3\) level problems, respectively. Similar results were obtained for both SVAMP and MAWPS-s, highlighting the potential of graph-based models in solving problems of varying length complexities.

For Seq2Tree models, GTS and SAU-Solver separately achieved an average improvement of 7.1% and 6.1%, respectively, compared to DNS on MAWPS-s. On Math23k, GTS achieved an average improvement of 7.4% compared to DNS, and SAU-Solver achieved an 8.3% improvement. The considerable improvements by the Seq2Tree models indicate their potential for equation representation learning using tree-structured decoders.

Surprisingly, MathEN achieved the highest problem-solving accuracy on the \(L_{1}\) -level task of MAWPS-s and the \(L_3\) -level task of Math23K, and also demonstrated a lower MAD value. On the other hand, DNS exhibited lower problem-solving accuracy than MathEN, and had higher MAD values, indicating that DNS is sensitive to the lengths of the expressions.

Among the four categories of solvers considered, PLM-based models demonstrated the best performance on \(L_1\) , \(L_2\) and \(L_3\) level tasks across all datasets. Notably, Graph2Tree exhibited advantages over other lightweight models specifically in handling tasks at \(L_2\) and \(L_3\) levels. Furthermore, it is worth highlighting that among the lightweight models, MathEN and SAU-Solver obtained the best results for \(L_1\) on MAWPS-s and Math23K, respectively. This could be due to the fact that \(L_1\) level tasks typically involve very simple syntax and language structure, which can be efficiently modeled by sequence-based encoders. In contrast, \(L_2\) and \(L_3\) level tasks involve more complex syntax and language structures.

Another interesting finding is that, on MAWPS-s, all models performed better at solving \(L_2\) level problems than the shorter \(L_1\) level problems except GPT-4. Further analysis showed that the average token length of \(L_1\) level problems was 26, which is significantly shorter compared to 31 and 21 on SVAMP and Math23k, respectively. It should be noted that each word is treated as a token for the English dataset MAWPS-s and SVAMP, while for the Chinese dataset Math23k, each token contains one or more words depending on the output of the applied tokenizers.

(2) Performance on solving problems indicated by expression tree height.

Table  8 shows the performances on characteristics of depth complexity. Overall, the accuracy of models decreases as the expression tree depth increases. In particular, GPT-4 achieves accuracies of 97.0% and 74.5% on the \(H_1\) and \(H_2\) subsets of MAWPS-s respectively. However, there is a significant performance drop of 30% from \(H_1\) to \(H_2\) . Similar reduction in performance is observed on Math23K. This suggests that GPT-4 is highly sensitive to the depth of the expressions.

For Graph2Tree models, an average accuracy reduction \(d_{2-1}\) by 15% and 10% for all models from \(H_1\) to \(H_2\) level problems on MAWPS-s and Mathe23K separately. \(d_{2-1}\) is an indicator of model robustness that the lower of \(d_{2-1}\) is, the more robustness of the model is. This suggests that capturing the structure information hidden in problem texts is challenging work for both sequence-based and graph-based methods.

Seq2Tree models have a 7% to 10% improvement on Math23k and MAWPS-s separately compared to DNS. SAU-Solver performs better than MathEN on MAWPS-s but worse on Math23k. Graph2Tree models perform better on Math23k than Seq2Tree models. However, Graph2Tree \(^1\) performs equal or better on \(H_2\) level problems compared to all other methods, which indicates the latency of problem learning on complexity structures. Unlike Graph2Tree \(^1\) and others, Graph2Tree \(^2\) is much more insensitive for the task of depth expression prediction. This suggests that the sentence level information might enhance the representation learning of complex expressions.

For Seq2Seq models, MathEN performs better on all datasets compared to DNS, especially on Math23k \(H_1\) level dataset which achieves the best result (69.2%). However, the accuracy reduction of MathEN by 15.6% and 12.9 from \(H_1\) to \(H_2\) level problem on MAWPS-s and Math23k separately show that MathEN is much more sensitive to expression depth than DNS.

(3) Performance on solving problems indicated by implicit condition

Table  9 demonstrates the significant advantages of PLM-based models in solving implicit condition problems. In particular, GPT-4 exhibits a 1% performance improvement on \(C_2\) compared to \(C_1\) . In terms of lightweight models, MathEN and Graph2Tree \(^1\) obtained an outstanding performance of 67.1% and 66.4% separately. For solving implicit relation problems, MathEN achieved 61.5% which is 2.3% higher than the second-highest result obtained by GTS. Meanwhile, it shows that Seq2Tree models and Graph2Tree \(^1\) method performed similarly (i.e., 59.2%, 58.6% and 58.0% separately) on solving implicit relation problems. For robustness, MathEN has the lowest \(d_{2-1}\) performance among the considered models except for the Graph2Tree \(^2\) .

(4) Performance on solving problems indicated by the arithmetic situation.

As depicted in Table  10 , PLM-based models achieve the best results across all four problem types. However, for Summation type problems ( \(S_s\) ), MathEN achieves an impressive accuracy rate of 72.2%, which is 30% higher than that of GPT-4. Among all lightweight models, MathEN exhibits outstanding performance. For example, MathEN achieved 72.2% accuracy on solving \(S_s\) type problems and GTS got 71.3% on solving Motion type problems ( \(S_m\) ). Whereas the Graph2Tree models generally performed poorly on situational problems. Since situational problems contain more complex quantity relations among various math objects. These quantity relations are usually indicated by high-level contexts which are much more challenging to obtain the required quantity relations for most sequence and graph based models. Moreover, performance differs greatly on different types of situation problems even with the same model which indicates that differentiated models are required for solving different types of situational problems.

Conclusively, the experimental results revealed the following: (1) PLM-based models have shown a significant advantage over other models in almost all tasks. However, they also suffer from a rapid decline in performance when the length and depth of expressions increase. (2) Tree-based expression decoders achieved significant improvement compared to sequence-based decoders. This demonstrates the efficiency of generative models in learning the structural information hidden in mathematical expressions, compared to traditional retrieved models. (3) For encoders, graph-based models perform similarly to sequence-based models. There may be two reasons for this observation. First, current sequence and graph-based models may have encountered a technical bottleneck. These models are trained and fine-tuned for general or specific natural language tasks that are not necessarily suitable for learning mathematical relations in sophisticated situations of math word problems containing various common sense knowledge and domain knowledge. (4) Equation normalization and ensemble models (such as MathEN) achieved outstanding performance compared to pure Seq2Seq models. Since a math word problem may have more than one equation, it is necessary to normalize duplicated equations when working with current sequence and graph-based models.

Comparative analysis of math expression decoding models

To investigate the impact of the decoding models integrated with different encoding models, we conduct a confusion test of depth-first tree decoding (DT Cell) [ 6 ] and breadth-first tree decoding (BT Cell) [ 17 ] in this section. For each decoding module, encoding modules of GRU cell [ 2 ], GCN cell [ 6 ], biGraphSAGE cell [ 17 ] and BERT [ 26 ] are connected separately to compose a full pipeline of encoding-decoding network. The test is conducted on Math23k and the corresponding result is shown in Table  11 .

As shown in Table  11 , the DT cell demonstrates significant advantages in terms of accuracy for both expression and answer when combined with any encoding model. Particularly, when using GRU, GRU+GCN, and BERT as encoders, the DT cell outperforms the BT cell by more than 10%. However, when utilizing GRU+biGraphSAGE as the encoder, the DT cell shows lower performance improvements of 6.7% and 6.5% for both \(E_{acc}\) and \(A_{acc}\) , compared to other encoder combinations. One possible reason is that the GRU+biGraphSAGE encoder incorporates heterogeneous graph information from the problem text, which has relevance to breadth-first decomposing.

Comparative analysis of problem encoding models

Experimental results obtained in Sect. “ Performance on solving characteristic problems ” show the effectiveness of tree-structured models in math expression decoding. However, the performance of models in solving different characteristic problems varies as different neural cells are applied during encoding. In this section, a further experiment is conducted to evaluate the performance of different encoding units in solving different characteristic problems.

In the following experiments, we split the encoding modules into composable neural cells from the baseline methods for problem encoding. From sequence-based encoders, LSTM and GRU cells are obtained and GCN and biGraphSAGE cells are obtained from graph-based encoders. The obtained GRU and LSTM cells are designed with 2 layers. In our experiment, both 2 and 4-layer cells are tested to evaluate the effect of cell depth in problem encoding. The above cells are implemented individually or jointly for problem encoding followed by the tree-based decoding module [ 5 ] to generate math expressions. To combine the outputs of the GCN module and the biGraphSAGE module, a multi-head attention network is used which takes the final hidden vectors of the GCN and biGraphSAGE as input and outputs a new combined hidden vector. The \(E_{acc}\) and \(A_{acc}\) results are presented in Table  12 .

In Table  12 , \(G_{qcell}\) and \(G_{qcom}\) denote the quantity cell graph and quantity comparison graph built in [ 6 ] separately as the input of the GCN network. \(G_{wcon}\) refers to the constituency graph defined in [ 17 ] and is processed by the biGraphSAGE network.

Obviously, the BERT-DT combination outperforms other combinations in almost all test items. Here, we focus on discussing the performance of lightweight model combinations.

Firstly, when selecting the number of layers for the sequence encoder, there is no significant difference in performance between 4-layer and 2-layer networks. The 2-layer GRU cell obtained the best result among all sequence-based cells. The 2-layer GRU cell performed better than the 4-layer cells and similar results were obtained by LSTM cells. Therefore, we believe that it may be not an efficient way to try to improve the problem encoding by increasing the depth of sequence-based neural networks.

Secondly, when incorporating graph information, the combination of \(G_{qcom}\) and \(G_{qcell}\) obtained the best performance. For GCN-based modules, the GCN cell with \(G_{qcom}\) information obtained outstanding results on all levels of length complexity tasks and the \(H_1\) level depth complexity task. The GCN cell performs best on \(H_2\) level task when combining the graph information of \(G_{qcell}\) and \(G_{qcom}\) . A possible reason is that the \(G_{qcell}\) is homologous to the basic text sequence while the \(G_{qcom}\) contains additional comparison information that plays an important role in guiding the math expression generation. Second, the biGraphSAGE cell with \(G_{wcon}\) got lower performance than GCN cells, partly due to the sparsity of the constituency matrix used.

Furthermore, considering the fusion of multiple features, it can be observed from Table  12 that the mixed module by combining GCN ( \(G_{qcell}\) ) and biGraphSAGE ( \(G_{wcon}\) ) achieved better performance than biGraphSAGE ( \(G_{wcon}\) ) but worse than GCN ( \(G_{qcell}\) ) individually applied. The performance is slightly improved after the \(G_{qcom}\) is added. However, the overall performance of mixed modules is worse than only GCN modules. This leads us to conclude that choosing an appropriate encoder is a key decision in hybrid information encoding.

This paper provides a comprehensive survey and performance analysis of DL-based solvers for Math Word Problems (MWPs). These solvers are categorized into four distinct groups based on their network architecture and neural cell types: Seq2Seq-based models, Seq2Tree-based models, Graph2Tree-based models, and PLM-based models.

During the training phase, it has been observed that most models exhibit overfitting issues on datasets. Figure  5 illustrates the effectiveness of the Math23K training set, which consists of 18k instances, in meeting the training requirements of most deep learning-based models. Conversely, when trained on the MAWPS-s dataset, which contains approximately 1.5 k instances, almost all models show noticeable signs of overfitting. This particular finding serves as a valuable reference point for future endeavors involving dataset construction.

In terms of overall performance, pre-trained language models outperformed other models on both single-equation tasks and multi-equation tasks. As depicted in Table  6 , GPT-4 achieves the best results on the MAWPS-s, SVAMP, and Draw-1k datasets, and the BertTree performs well on the two Chinese datasets, Math23k and HMWP. This demonstrates the significant advantage of PLM-based models, especially large-scale language modes such as GPT-4, in solving math word problems. However, there are variations in the performance of different pre-trained language models on Chinese and English datasets. Consistent findings were also reported in previous research studies [ 26 , 27 ]. For instance, in [ 26 ], it is highlighted that the adoption of the Bert2Tree model yields a 2.1% improvement in answer accuracy compared to the Graph2Tree model on the MAWPS-s dataset, while achieving a 7% improvement on the Math23k dataset. This outcome can be attributed to two factors: (1) The Chinese pre-trained model employed in this study, namely Chinese BERT with whole word masking [ 55 ], differs from the BERT-base model used for English. Thus, it is reasonable to infer that task-specific training or fine-tuning of pre-trained language models is essential to fully leverage their advantages. (2) Pre-trained language models exhibit greater proficiency in handling MWPs with intricate semantics. As evidenced by Table  3 , the average length of question texts in the Math23k and HMWP datasets is 1.5-2 times longer than that of other datasets, suggesting the presence of more complex syntactic and semantic information. Utilizing pre-trained language models allows for improved extraction and utilization of pertinent information necessary for effective problem-solving.

Meanwhile, Tables  7 and 8 show that neural solvers are sensitive to the complexity of equations (e.g., equation length, equation tree height), as well as the length of the original problem text. However, we also found that (1) the MathEN model based on the Seq2Seq framework achieved the best results on some datasets (MAWPS-s), indicating that there is room for further optimization of the Graph2Tree framework. Further work is needed to discover the main factors influencing the performance of Graph2Tree on MAWPS-s and to improve it accordingly. (2) For all solvers on MAWPS-s, the increase in expression length did not result in a decline in solving accuracy, but rather showed varying degrees of improvement, which is completely opposite to what we observed on the other two datasets. Further research is needed to explain this phenomenon.

In terms of decoding performance, models integrated with tree decoders exhibit strong performance in generating math equations. Meanwhile, the DT Cell performed much better than the BT Cell on most datasets, making it widely used. However, we believe that the BT Cell still has its special advantages, as its decoding process is more in line with human thought in arithmetic reasoning, where a task is decomposed into multiple sub-tasks, each corresponding to a certain mathematical operation semantics. Therefore, the output results of this model can be better applied in intelligent education scenario, such as step-by-step intelligent tutoring. This raises new questions for researchers on how to design models with human-like arithmetic reasoning ability and make them run efficiently.

In terms of encoding performance, implicit information representation of problem texts plays a crucial role in enhancing the performance of models. Experimental results have shown that combining structure and non-structure information can effectively enhance solver performance. However, we found that not all structure information is equally effective, and some may be more useful in improving solving performance than others. Therefore, it is necessary to design more effective mechanisms or algorithms to determine which information should be added and how the added information can be fused with current information for maximum utility.

Moreover, the emergence of large language models such as GPT has propelled MWP-solving technology to a new stage. These models can gradually improve the accuracy of MWP solvers and their remarkable reasoning abilities enable them to generate step-by-step solutions based on prompts, which is truly impressive. However, these large language models also face challenges such as large parameter sizes and usage restrictions.

Limitations

The limitations of this study are primarily attributed to the emergence of novel models and their integration with knowledge bases, which present challenges in re-implementing these algorithms. Consequently, performance comparisons with specific papers such as [ 7 , 27 ] have been considered in this study. Additionally, due to hardware constraints, we did not fine-tune the pre-trained language models, therefore, the performance of various fine-tuned models has not been reported.

Furthermore, for PLM-based models like GPT-4 [ 30 ], advanced prompts or interaction strategies were not employed in our experiments, which may result in lower accuracy. Moreover, it is worth noting that PLM-based models have the advantage of generating descriptive solution processes, but their performance evaluation in this aspect has not been conducted in this study.

In this paper, we have aimed to provide a comprehensive and analytical comparative examination of the state-of-the-art neural solvers for math word problem solving. Our objective was to serve as a reference for researchers in the design of future models by offering insights into the structure of neural solvers, their performance, and the pros and cons of the involved neural cells.

We first identify the architectures of typical neural solvers, rigorously analyzing the framework of each category, particularly the four typical categories: Seq2Seq, Seq2Tree, Graph2Tree and PLM-based models. A four-dimensional indicator is proposed to categorize the considered datasets to precisely evaluate the performance of neural solvers in solving different characteristics of MWPs. Typical neural solvers are decomposed into highly reusable components. To evaluate the considered solvers, we have established a testbed and conducted comprehensive experiments on five popular datasets using eight representative MWP solvers, followed by a comparative analysis on the achieved results.

After conducting an in-depth analysis, we found that: (1) PLM-based models consistently demonstrate significant accuracy advantages across almost all datasets, yet there remains room for improvement to meet practical demands. (2) Models integrated with tree decoders exhibit strong performance in generating math equations. The length of expressions and the depth of expression trees are important factors affecting solver performance when solving problems with different expression features. The longer the expression and the deeper the expression tree, the lower the solver performance. (3) Implicit information representation of problem texts plays a crucial role in enhancing the performance of models. While the use of multi-modal feature representation has shown promising improvements in performance, it is crucial to ensure information complementary among modalities.

Based on our findings, we have the following suggestions for future work. Firstly, there is still room to improve the performance of solvers, including problem representation learning, multi-solution generation, etc.

Secondly, to better support the potential real-world applications in education, the output of solvers should be more comprehensive. Solvers are expected to generate decomposable and interpretable solutions, rather than just simple expressions or answers. The emergence of large language models has provided ideas for addressing this issue, but it remains a challenge to ensure the validity and interpretability of the outputs for teaching and tutoring applications.

Finally, to evaluate the neural solvers more comprehensively, it is necessary to develop more diverse metrics and evaluation methods in future research. These metrics and methods should capture the performance of solvers in problem understanding, automatic addition of implicit knowledge, solution reasoning, interpretability of results, and other relevant aspects.

Data Availability

The data used in this study is available from the corresponding author upon reasonable request.

Abbreviations

Math Word Problems

Pre-trained Language Model

Tree-structural Decoder

Deep Learning

Depth-first decomposing Tree

Breadth-first decomposing Tree

Mean accuracy difference

Universal Expression Tree

Unit Dependency Graph

Explanation

Problem text

Word token of the P

Encoding network

Decoding network

The hidden vector state i

the query matrix, key matrix and value matrix separately

A graph with vertex V and edge E

trainable parameters and bias

Expression length

Expression tree depth

Implicit condition

Arithmetic situation

Equation accuracy

Math expressions

Answer accuracy

Zhang D, Wang L, Zhang L, Dai BT, Shen HT (2019) The gap of semantic parsing: a survey on automatic math word problem solvers. IEEE Trans Pattern Anal Mach Intell 42(9):2287–2305

Article   Google Scholar  

Wang Y, Liu X, Shi S (2017) Deep neural solver for math word problems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 845–854. Association for Computational Linguistics, Copenhagen, Denmark

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

Article   MathSciNet   Google Scholar  

Wang L, Wang Y, Cai D, Zhang D, Liu X (2018) Translating a math word problem to a expression tree. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1064–1069. Association for Computational Linguistics, Brussels, Belgium

Xie Z, Sun S (2019) A goal-driven tree-structured neural model for math word problems. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 5299–5305. International Joint Conferences on Artificial Intelligence Organization, Macao, China

Zhang J, Wang L, Lee RKW, Bin Y, Wang Y, Shao J, Lim EP (2020) Graph-to-tree learning for solving math word problems. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3928–3937. Association for Computational Linguistics, Seattle, USA

Wu Q, Zhang Q, Wei Z (2021) An edge-enhanced hierarchical graph-to-tree network for math word problem solving. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 1473–1482. Association for Computational Linguistics, Punta Cana, Dominican Republic

Yang Z, Qin J, Chen J, Lin L, Liang X (2022) LogicSolver: Towards interpretable math word problem solving with logical prompt-enhanced learning. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 1–13. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates

Jie Z, Li J, Lu W (2022) Learning to reason deductively: Math word problem solving as complex relation extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 5944–5955. Association for Computational Linguistics, Dublin, Ireland

Lan Y, Wang L, Zhang Q, Lan Y, Dai BT, Wang Y, Zhang D, Lim EP (2022) Mwptoolkit: An open-source framework for deep learning-based math word problem solvers. Proceedings of the AAAI Conference on Artificial Intelligence 36:13188–13190

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA

Lin JCW, Shao Y, Djenouri Y, Yun U (2021) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowl-Based Syst 212:106548

Chiang TR, Chen YN (2019) Semantically-aligned equation generation for solving and reasoning math word problems. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie, pp. 2656–2668. Association for Computational Linguistics, Minneapolis, Minnesota

Hong Y, Li Q, Ciao D, Haung S, Zhu SC (2021) Learning by fixing: Solving math word problems with weak supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, 35:4959–4967

Hong Y, Li Q, Gong R, Ciao D, Huang S, Zhu SC (2021) SMART: a situation model for algebra story problems via attributed grammar. In: Proceedings of the 2021 AAAI Conference on Artificial Intelligence, pp. 13009–13017. Vancouver, Canada

Roy S, Roth D (2017) Unit dependency graph and its application to arithmetic word problem solving. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 3082–3088. San Francisco, USA

Li S, Wu L, Feng S, Xu F, Xu F, Zhong S (2020) Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2841–2852. Association for Computational Linguistics, Punta Cana, Dominican Republic

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations. San Diego, California

Cai D, Lam W (2020) Graph transformer for graph-to-sequence learning. Proceedings of the AAAI Conference on Artificial Intelligence 34:7464–7471

Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035. Curran Associates Inc., Red Hook, NY, USA

Mukherjee A, Garain U (2008) A review of methods for automatic understanding of natural language mathematical problems. Artificial Intell Rev 29(2):93–122

Meadows J, Freitas A (2022) A survey in mathematical language processing. arXiv:2205.15231 [cs]

Lu P, Qiu L, Yu W, Welleck S, Chang KW (2023) A survey of deep learning for mathematical reasoning. arXiv:2212.10535 [cs]

Liu Q, Guan W, Li S, Kawahara D (2019) Tree-structured decoding for solving math word problems. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2370–2379. Association for Computational Linguistics, Hong Kong, China

Liang Z, Zhang J, Wang L, Qin W, Lan Y, Shao J, Zhang X (2022) MWP-BERT: Numeracy-augmented pre-training for math word problem solving. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 997–1009. Association for Computational Linguistics, Seattle, United States

Zhang W, Shen Y, Ma Y, Cheng X, Tan Z, Nong Q, Lu W (2022) Multi-view reasoning: Consistent contrastive learning for math word problem. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 1103–1116. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Advances in Neural Information Processing Systems, 33:1877–1901. Curran Associates, Inc

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. Tech. rep., OpenAI. OpenAI blog

Zhou A, Wang K, Lu Z, Shi W, Luo S, Qin Z, Lu S, Jia A, Song L, Zhan M, Li H (2023) Solving challenging math word problems using GPT-4 code interpreter with code-based self-verification. https://doi.org/10.48550/arXiv.2308.07921 . arXiv:2308.07921 [cs]

Fletcher CR (1985) Understanding and solving arithmetic word problems: a computer simulation. Behav Res Methods Instruments Comput 17(5):565–571

Kushman N, Artzi Y, Zettlemoyer L, Barzilay R (2014) Learning to automatically solve algebra word problems. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 271–281. Association for Computational Linguistics, Baltimore, Maryland

Shen Y, Jin C (2020) Solving math word problems with multi-encoders and multi-decoders. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 2924–2934. International Committee on Computational Linguistics, Barcelona, Spain

Liang CC, Hsu KY, Huang CT, Li CM, Miao SY, Su KY (2016) A tag-based statistical english math word problem solver with understanding, reasoning and explanation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 4254–4255. San Diego, USA

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota

Wang L, Zhang D, Zhang J, Xu X, Gao L, Dai BT, Shen HT (2019) Template-based math word problem solvers with recursive neural networks. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pp. 7144–7151. AAAI Press, Hawaii, USA

Li J, Wang L, Zhang J, Wang Y, Dai BT, Zhang D (2019) Modeling intra-relation in math word problems with different functional multi-head attentions. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6162–6167. Association for Computational Linguistics

Qin J, Lin L, Liang X, Zhang R, Lin L (2020) Semantically-aligned universal tree-structured solver for math word problems. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3780–3789. Association for Computational Linguistics, Punta Cana, Dominican Republic

Wu Q, Zhang Q, Wei Z, Huang X (2021) Math word problem solving with explicit numerical values. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 5859–5869. Association for Computational Linguistics, Bangkok, Thailand

Yu W, Wen Y, Zheng F, Xiao N (2021) Improving math word problems with pre-trained knowledge and hierarchical reasoning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3384–3394. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic

Li Z, Zhang W, Yan C, Zhou Q, Li C, Liu H, Cao Y (2022) Seeking patterns, not just memorizing procedures: Contrastive learning for solving math word problems. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 2486–2496. Association for Computational Linguistics, Dublin, Ireland

Shen J, Yin Y, Li L, Shang L, Jiang X, Zhang M, Liu Q (2021) Generate & rank: A multi-task framework for math word problems. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2269–2279. Association for Computational Linguistics, Punta Cana, Dominican Republic

Shen Y, Liu Q, Mao Z, Cheng F, Kurohashi S (2022) Textual enhanced contrastive learning for solving math word problems. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 4297–4307. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates

Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, Schuh P, Shi K, Tsvyashchenko S, Maynez J, Rao A, Barnes P, Tay Y, Shazeer N, Prabhakaran V, Reif E, Du N, Hutchinson B, Pope R, Bradbury J, Austin J, Isard M, Gur-Ari G, Yin P, Duke T, Levskaya A, Ghemawat S, Dev S, Michalewski H, Garcia X, Misra V, Robinson K, Fedus L, Zhou D, Ippolito D, Luan D, Lim H, Zoph B, Spiridonov A, Sepassi R, Dohan D, Agrawal S, Omernick M, Dai AM, Pillai TS, Pellat M, Lewkowycz A, Moreira E, Child R, Polozov O, Lee K, Zhou Z, Wang X, Saeta B, Diaz M, Firat O, Catasta M, Wei J, Meier-Hellstern K, Eck D, Dean J, Petrov S, Fiedel N (2022) PaLM: Scaling language modeling with pathways . arXiv:2204.02311 [cs]

Lewkowycz A, Andreassen A, Dohan D, Dyer E, Michalewski H, Ramasesh V, Slone A, Anil C, Schlag I, Gutman-Solo T, Wu Y, Neyshabur B, Gur-Ari G, Misra V (2022) Solving Quantitative Reasoning Problems with Language Models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 3843–3857. Curran Associates, Inc

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G (2023). LLaMA: Open and efficient foundation language models https://doi.org/10.48550/arXiv.2302.13971 . arXiv:2302.13971 [cs]

Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014. Montreal, Canada

Ghazvini A, Abdullah SNHS, Kamru Hasan M, Bin Kasim DZA (2020) Crime spatiotemporal prediction with fused objective function in time delay neural network. IEEE Access 8:115167–115183

Djenouri Y, Srivastava G, Lin JCW (2021) Fast and accurate convolution neural network for detecting manufacturing data. IEEE Trans Ind Inform 17(4):2947–2955

Wu Q, Zhang Q, Fu J, Huang X (2020) A knowledge-aware sequence-to-tree network for math word problem solving. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7137–7146. Association for Computational Linguistics, Punta Cana, Dominican Republic

Gupta A, Kumar S, Kumar P S (2023) Solving age-word problems using domain ontology and bert. In: Proceedings of the 6th Joint International Conference on Data Science & Management of Data, pp. 95–103. ACM, New York, NY, USA

Petroni F, Rocktäschel T, Riedel S, Lewis P, Bakhtin A, Wu Y, Miller A (2019) Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2463–2473. Association for Computational Linguistics, Hong Kong, China

Jiang Z, Xu FF, Araki J, Neubig G (2020) How can we know what language models know? Trans Assoc Comput Linguistics 8:423–438 ( Place: Cambridge, MA Publisher: MIT Press )

Liu Z, Lin W, Shi Y, Zhao J (2021) A robustly optimized bert pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227. Chinese Information Processing Society of China, Huhhot, China

Cui Y, Che W, Liu T, Qin B, Yang Z, Wang S, Hu G (2019) Pre-training with whole word masking for chinese bert. IEEE/ACM Trans Audio Speech Language Process 29:3504–3514

Chen J, Pan X, Yu D, Song K, Wang X, Yu D, Chen J (2023) Skills-in-context prompting: Unlocking compositionality in large language models. https://doi.org/10.48550/arXiv.2308.00304 . arXiv:2308.00304 [cs]

Wei J, Wang X, Schuurmans D, Bosma M, ichter b, Xia F, Chi E, Le QV, Zhou D (2022) Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837. Curran Associates, Inc

Huang X, Ruan W, Huang W, Jin G, Dong Y, Wu C, Bensalem S, Mu R, Qi Y, Zhao X, Cai K, Zhang Y, Wu S, Xu P, Wu D, Freitas A, Mustafa MA (2023) A survey of safety and trustworthiness of large language models through the lens of verification and validation. http://arxiv.org/abs/2305.11391 . arXiv:2305.11391 [cs]

Dong L, Lapata M (2016) Language to logical form with neural attention. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 33–43. Association for Computational Linguistics, Berlin, Germany

Zhang J, Lee RKW, Lim EP, Qin W, Wang L, Shao J, Sun Q (2020) Teacher-student networks with multiple decoders for solving math word problem. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 4011–4017. International Joint Conferences on Artificial Intelligence Organization, Yokohama, Japan

Bobrow DG (1964) Natural language input for a computer problem solving system. Tech. rep., Massachusetts Institute of Technology, USA

Bakman Y (2007) Robust understanding of word problems with extraneous information. arXiv General Mathematics. https://api.semanticscholar.org/CorpusID:117981901

Koncel-Kedziorski R, Hajishirzi H, Sabharwal A, Etzioni O, Ang SD (2015) Parsing algebraic word problems into equations. Trans Assoc Comput Linguistics 3:585–597 ( Place: Cambridge, MA )

Roy S, Upadhyay S, Roth D (2016) Equation parsing: Mapping sentences to grounded equations. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1097. Association for Computational Linguistics, Austin, Texas

Wang L, Zhang D, Gao L, Song J, Guo L, Shen HT (2018) MathDQN: Solving arithmetic word problems via deep reinforcement learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 5545–5552. AAAI Press, New Orleans, USA

Hosseini MJ, Hajishirzi H, Etzioni O, Kushman N (2014) Learning to solve arithmetic word problems with verb categorization. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 523–533. Association for Computational Linguistics, Doha, Qatar

Shi S, Wang Y, Lin CY, Liu X, Rui Y (2015) Automatically solving number word problems by semantic parsing and reasoning. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1132–1142. Lisbon, Portugal

Liang CC, Hsu KY, Huang CT, Li CM, Miao SY, Su KY (2016) A tag-based English math word problem solver with understanding, reasoning and explanation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 67–71. Association for Computational Linguistics, San Diego, California

Upadhyay S, Chang MW, Chang KW, Yih Wt (2016) Learning from explicit and implicit supervision jointly for algebra word problems. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 297–306. Association for Computational Linguistics, Austin, Texas

Chen S, Zhou M, He B, Wang P, Wang Z (2022) A comparative analysis of math word problem solving on characterized datasets. In: In Proceedings of the 2022 International Conference on Intelligent Education and Intelligent Research. IEEE, Wuhan, China

He B, Chen S, Miao Z, Liang G, Pan K, Huang L (2022) Comparative analysis of problem representation learning in math word problem solving. In: In Proceedings of the 2022 International Conference on Intelligent Education and Intelligent Research. IEEE, Wuhan, China

Roy S, Roth D (2015) Solving general arithmetic word problems. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1743–1752. Association for Computational Linguistics, Lisbon, Portugal

Yenduri G, M R, G CS, Y S, Srivastava G, Maddikunta PKR, G DR, Jhaveri RH, B P, Wang W, Vasilakos AV, Gadekallu TR (2023) Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. arXiv:2305.10435

Zhang H, Lu G, Zhan M, Zhang B (2021) Semi-supervised classification of graph convolutional networks with laplacian rank constraints. Neural Process Lett 54(4):2645–2656

Zhou L, Dai S, Chen L (2015) Learn to solve algebra word problems using quadratic programming. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 817–822. Association for Computational Linguistics, Lisbon, Portugal

Koncel-Kedziorski R, Roy S, Amini A, Kushman N, Hajishirzi H (2016) MAWPS: A math word problem repository. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1152–1157. Association for Computational Linguistics, San Diego, California

Huang D, Shi S, Lin CY, Yin J, Ma WY (2016) How well do computers solve math word problems? Large-scale dataset construction and evaluation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 887–896. Association for Computational Linguistics, Berlin, Germany

Upadhyay S, Chang MW (2017) Annotating derivations: A new evaluation strategy and dataset for algebra word problems. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 494–504. Association for Computational Linguistics, Valencia, Spain

Patel A, Bhattamishra S, Goyal N (2021) Are NLP models really able to solve simple math word problems? In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2080–2094. Association for Computational Linguistics, Bangkok, Thailand

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62007014) and the Humanities and Social Sciences Youth Fund of the Ministry of Education (No. 20YJC880024).

Author information

Authors and affiliations.

Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, China

Bin He, Xinguo Yu, Litian Huang, Hao Meng, Guanghua Liang & Shengnan Chen

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Xinguo Yu .

Ethics declarations

Conflict of interest.

The authors declare that they have no Conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

He, B., Yu, X., Huang, L. et al. Comparative study of typical neural solvers in solving math word problems. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01454-8

Download citation

Received : 27 March 2023

Accepted : 17 April 2024

Published : 22 May 2024

DOI : https://doi.org/10.1007/s40747-024-01454-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Comparative analysis
  • Deep learning model
  • Math word problem solving
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. The 3 Types Of Experimental Design (2024)

    case study research design is a kind of experimental research

  2. Experimental Research Design With Examples

    case study research design is a kind of experimental research

  3. what is case study as a research design

    case study research design is a kind of experimental research

  4. Experimental Research Design

    case study research design is a kind of experimental research

  5. The case study research design.

    case study research design is a kind of experimental research

  6. Case Study Research Design

    case study research design is a kind of experimental research

VIDEO

  1. WHAT IS CASE STUDY RESEARCH? (Qualitative Research)

  2. Case Study Research design and Method

  3. (2/75) Why is the literacy rate in Kerala so high #shorts #kerala #literacy

  4. Who Owns Antarctica Continent ? #shorts #why #who

  5. Case Study Research

  6. (14/75) Japaneese Community Relation in india #shorts #viral #fact

COMMENTS

  1. What Is a Case Study?

    Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...

  2. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  3. Types of Research Designs

    The case study research design is also useful for testing whether a specific theory and model actually applies to phenomena in the real world. It is a useful design when not much is known about an issue or phenomenon. ... Experimental research designs support the ability to limit alternative explanations and to infer direct causal relationships ...

  4. Exploring Experimental Research: Methodologies, Designs, and

    Experimental research serves as a fundamental scientific method aimed at unraveling. cause-and-effect relationships between variables across various disciplines. This. paper delineates the key ...

  5. Research Design

    Step 2: Choose a type of research design. Within both qualitative and quantitative approaches, there are several types of research design to choose from. Each type provides a framework for the overall shape of your research. Types of quantitative research designs. Quantitative designs can be split into four main types.

  6. (PDF) Qualitative Case Study Methodology: Study Design and

    A common type of methodology in qualitative research are case studies -a case study allows researchers to study a phenomenon in its bounded context [42]. This project used a case study to ...

  7. An Experimental Template for Case Study Research

    which case study research designs attempt to mimic the virtues of experimental design and the degree to which they succeed. The classic experiment, with manipulated treatment and randomized control, thus provides a useful template for discussion. about methodological issues in experimental and observational contexts.

  8. An introduction to different types of study design

    Study designs are the set of methods and procedures used to collect and analyze data in a study. Broadly speaking, there are 2 types of study designs: descriptive studies and analytical studies. Descriptive studies. Describes specific characteristics in a population of interest; The most common forms are case reports and case series; In a case ...

  9. Types of Research Designs

    The case study research design is also useful for testing whether a specific theory and model actually applies to phenomena in the real world. It is a useful design when not much is known about a phenomenon. ... Experimental research designs support the ability to limit alternative explanations and to infer direct causal relationships in the ...

  10. Research design

    Research design is a comprehensive plan for data collection in an empirical research project. It is a 'blueprint' for empirical research aimed at answering specific research questions or testing specific hypotheses, and must specify at least three processes: the data collection process, the instrument development process, and the sampling ...

  11. Experimental Research Designs: Types, Examples & Advantages

    Pre-experimental research is of three types —. One-shot Case Study Research Design. One-group Pretest-posttest Research Design. Static-group Comparison. 2. True Experimental Research Design. A true experimental research design relies on statistical analysis to prove or disprove a researcher's hypothesis.

  12. Experimental Research: What it is + Types of designs

    The classic experimental design definition is: "The methods used to collect data in experimental studies.". There are three primary types of experimental design: The way you classify research subjects based on conditions or groups determines the type of research design you should use. 01. Pre-Experimental Design.

  13. Case Study

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  14. Clinical research study designs: The essentials

    Introduction. In clinical research, our aim is to design a study, which would be able to derive a valid and meaningful scientific conclusion using appropriate statistical methods that can be translated to the "real world" setting. 1 Before choosing a study design, one must establish aims and objectives of the study, and choose an appropriate target population that is most representative of ...

  15. Understanding Research Designs and External Scientific Evidence

    Single-subject designs - Also known as single-case experimental designs, this type of experimental design allows researchers to closely examine specific changes in each participant. Each participant serves as their own control (i.e., compared to themselves) and researchers measure the outcome or dependent variable repeatedly across phases (e ...

  16. Study designs: Part 1

    The study design used to answer a particular research question depends on the nature of the question and the availability of resources. In this article, which is the first part of a series on "study designs," we provide an overview of research study designs and their classification. The subsequent articles will focus on individual designs.

  17. Research design

    The third type of non-experimental research is a longitudinal design. A longitudinal design examines variables such as performance exhibited by a group or groups over time (see Longitudinal study). Examples of flexible research designs Case study. Famous case studies are for example the descriptions about the patients of Freud, who were ...

  18. (PDF) Basics of Research Design: A Guide to selecting appropriate

    for validity and reliability. Design is basically concerned with the aims, uses, purposes, intentions and plans within the. pr actical constraint of location, time, money and the researcher's ...

  19. What is a Case Study? Definition & Examples

    A case study is particularly beneficial when your research: Requires a deep, contextual understanding of a specific case. Needs to explore or generate hypotheses rather than test them. Focuses on a contemporary phenomenon within a real-life context. Learn more about Other Types of Experimental Design.

  20. Descriptive Research Design

    Descriptive research design is a type of research methodology that aims to describe or document the characteristics, behaviors, attitudes, opinions, or perceptions of a group or population being studied. ... Case Study. This involves an in-depth examination of a single individual, group, or situation to gain a detailed understanding of its ...

  21. Grammar intervention using graduated input type variation (gitv) for

    PurposeThis study examined the early efficacy of a new theory-driven principle of grammar intervention, graduated input type variation (GITV).MethodThree Cantonese-speaking children, aged between 4;01 and 5;10, with oral language difficulties participated in this single baseline within-participant single case experimental study. The children received a total of 300 teaching episodes of the ...

  22. Quasi-Experimental Research Design

    Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable (s) that is available in a true experimental design. In a quasi-experimental design, the researcher uses an existing group of participants that is not randomly assigned to ...

  23. Full article: Effectiveness of the Assessment of Burden of Chronic

    The study had a pragmatic, clustered, two-armed, quasi-experimental design. The intervention group (41 general practices; 176 patients) used the ABCC-tool during routine consultations and the control group (14 general practices; 61 patients) received usual care.

  24. Applied Sciences

    This case study explores the potential of integrating attentional bias modification training (ABMT) with mirror exposure therapy (MET), utilizing virtual reality and eye-tracking, for a 14-year-old girl diagnosed with anorexia nervosa (AN). The ABMT-MET intervention was used alongside a standard treatment program called Home Treatment (HoT), which combines cognitive behavioral therapy with ...

  25. Association of select psychiatric disorders with incident brain

    Background: Brain aneurysms represent a significant cause of hemorrhagic stroke. Prior research has demonstrated links between stress and stroke, including brain aneurysms. We aimed to determine relationships between select psychiatric disorders and aneurysms and aneurysmal SAH. Methods: We performed retrospective, case-control study of a National Veterans Affairs population with two ...

  26. Comparative study of typical neural solvers in solving math word

    In recent years, there has been a significant increase in the design of neural network models for solving math word problems (MWPs). These neural solvers have been designed with various architectures and evaluated on diverse datasets, posing challenges in fair and effective performance evaluation. This paper presents a comparative study of representative neural solvers, aiming to elucidate ...