Ready to level up your insights?

Get ready to streamline, scale and supercharge your research. Fill out this form to request a demo of the InsightHub platform and discover the difference insights empowerment can make. A member of our team will reach out within two working days.

Cost effective insights that scale

Quality insight doesn't need to cost the earth. Our flexible approach helps you make the most of research budgets and build an agile solution that works for you. Fill out this form to request a call back from our team to explore our pricing options.

  • What is InsightHub?
  • Data Collection
  • Data Analysis
  • Data Activation
  • Research Templates
  • Information Security
  • Our Expert Services
  • Support & Education
  • Consultative Services
  • Insight Delivery
  • Research Methods
  • Sectors We Work With
  • Meet the team
  • Advisory Board
  • Press & Media
  • Book a Demo
  • Request Pricing

Camp InsightHub

Embark on a new adventure. Join Camp InsightHub, our free demo platform, to discover the future of research.

FlexMR InsightHub

Read a brief overview of the agile research platform enabling brands to inform decisions at speed in this PDF.

InsightHub on the Blog

  • Surveys, Video and the Changing Face of Agile Research
  • Building a Research Technology Stack for Better Insights
  • The Importance of Delegation in Managing Insight Activities
  • Common Insight Platform Pitfalls (and How to Avoid Them)
  • Support and Education
  • Insight Delivery Services

FlexMR Services Team

Our services drive operational and strategic success in challenging environments. Find out how.

Video Close Connection Programme

Close Connections bring stakeholders and customers together for candid, human conversations.

Services on the Blog

  • Closing the Client-Agency Divide in Market Research
  • How to Speed Up Fieldwork Without Compromising Quality
  • Practical Ways to Support Real-Time Decision Making
  • Developing a Question Oriented, Not Answer Oriented Culture
  • Meet the Team

FlexMR Credentials Deck

The FlexMR credentials deck provides a brief introduction to the team, our approach to research and previous work.

FlexMR Insights Empowerment

We are the insights empowerment company. Our framework addresses the major pressures insight teams face.

Latest News

  • Insight as Art Shortlisted for AURA Innovation Award
  • FlexMR Launch Video Close Connection Programme
  • VideoMR Analysis Tool Added to InsightHub
  • FlexMR Makes Shortlist for Quirks Research Supplier Award
  • Latest Posts
  • Strategic Thinking
  • Technology & Trends
  • Practical Application
  • Insights Empowerment
  • View Full Blog Archives

FlexMR Close Connection Programme

Discover how to build close customer connections to better support real-time decision making.

Market Research Playbook

What is a market research and insights playbook, plus discover why should your team consider building one.

Featured Posts

  • Five Strategies for Turning Insight into Action
  • How to Design Surveys that Ask the Right Questions
  • Scaling Creative Qual for Rich Customer Insight
  • How to Measure Brand Awareness: The Complete Guide
  • All Resources
  • Client Stories
  • Whitepapers
  • Events & Webinars
  • The Open Ideas Panel
  • InsightHub Help Centre
  • FlexMR Client Network

Insights Empowerment Readiness Calculator

The insights empowerment readiness calculator measures your progress in building an insight-led culture.

MRX Lab Podcast

The MRX Lab podcast explores new and novel ideas from the insights industry in 10 minutes or less.

Featured Stories

  • Specsavers Informs Key Marketing Decisions with InsightHub
  • The Coventry Panel Helps Maintain Award Winning CX
  • Isagenix Customer Community Steers New Product Launch
  • Curo Engage Residents with InsightHub Community
  • Research Methods /
  • Strategic Thinking /
  • Practical Application /

What is Correlation Analysis? A Definition and Explanation

Emily james, 6 ways to engage remote stakeholders in insights.

Since the pandemic, remote or hybrid working has become much more commonplace and widely seen as a m...

Michael Connor

  • Insights Empowerment (29)
  • Practical Application (165)
  • Research Methods (283)
  • Strategic Thinking (188)
  • Survey Templates (7)
  • Tech & Trends (386)

Correlation analysis is a topic that few people might remember from statistics lessons in school, but the majority of insights professionals will know as a staple of data analytics. However, correlations are frequently misunderstood and misused, even in the insights industry for a number of reasons. So here is a helpful guide to the basics of correlation analysis, with a few links along the way.  

Definition of Correlation Analysis

Correlation Analysis is statistical method that is used to discover if there is a relationship between two variables/datasets, and how strong that relationship may be.

In terms of market research this means that, correlation analysis is used to analyse quantitative data gathered from research methods such as surveys and polls, to identify whether there is any significant connections, patterns, or trends between the two.

Essentially, correlation analysis is used for spotting patterns within datasets. A positive correlation result means that both variables increase in relation to each other, while a negative correlation means that as one variable decreases, the other increases.

Correlation Coefficients

There are usually three different ways of ranking statistical correlation according to Spearman, Kendall, and Pearson. Each coefficient will represent the end result as ‘ r’. Spearman’s Rank and Pearson’s Coefficient are the two most widely used analytical formulae depending on the types of data researchers have to hand:

Spearman’s Rank Correlation Coefficient

This coefficient is used to see if there is any significant relationship between the two datasets, and operates under the assumption that the data being used is ordinal, which here means that the numbers do not indicate quantity, but rather they signify a position of place of the subject’s standing (e.g. 1 st , 2 nd , 3 rd , etc.)

Spearmans Rank

This coefficient requires a table of data which displays the raw data, it’s ranks, and the different between the two ranks. This squared difference between the two ranks will be shown on a scatter graph, which will clearly indicate whether there is a positive correlation, negative correlation, or no correlation at all between the two variables. The constraint that this coefficient works under is -1 ≤ r ≤ +1, where a result of 0 would mean that there was no relation between the data whatsoever. For more information on Spearman’s Rank Correlation Coefficient, there is a great document explaining the process here .

Pearson Product-Moment Coefficient

This is the most widely used correlation analysis formula, which measures the strength of the ‘ linear ’ relationships between the raw data from both variables, rather than their ranks. This is an dimensionless coefficient, meaning that there are no data-related boundaries to be considered when conducting analyses with this formula, which is a reason why this coefficient is the first formula researchers try.

Pearsons Rank

However, if the relationship between the data is not linear, then that is when this particular coefficient will not accurately represent the relationship between the two variables, and when Spearman’s Rank must be implemented instead. Pearson’s coefficient requires the relevant data must be inputted into a table similar to that of Spearman’s Rank but without the ranks, and the result produced will be in the numerical form which all correlation coefficients produce, including Spearman’s Rank and Pearson’s Coefficient: -1 ≤ r ≤ +1.

When to Use

The two methods outlined above are to be used according to whether there are parameters associated with the data gathered. The two terms to watch out for are:

  • Parametric: (Pearson’s Coefficient) Where the data must be handled in relation to the parameters of populations or probability distributions. Typically used with quantitative data already set out within said parameters.
  • Nonparametric: (Spearman’s Rank) Where no assumptions can be made about the probability distribution. Typically used with qualitative data, but can be used with quantitative data if Spearman’s Rank proves inadequate.

In cases when both are applicable, statisticians recommend using the parametric methods such as Pearson’s Coefficient, because they tend to be more precise. But that doesn’t mean discount the non-parametric methods if there isn’t enough data or a more specified accurate result is needed.

Interpreting Results

Typically, the best way to gain a generalised but more immediate interpretation of the results of a set of data, is to visualise it on a scatter graph such as these:

Positive Correlation Graph

Positive Correlation

Any score from +0.5 to +1 indicates a very strong positive correlation, which means that they both increase at the same time. The line of best fit, or the trend line, is places to best represent the data on the graph. In this case, it is following the data points upwards to indicate the positive correlation.  

Negative Correlation Graph

Negative Correlation

Any score from -0.5 to -1 indicate a strong negative correlation, which means that as one variable increases, the other decreases proportionally. The line of best fit can be seen here to indicate the negative correlation. In these cases it will slope downwards from the point of origin.

No Correlation Graph-2

No Correlation

Very simply, a score of 0 indicates that there is no correlation, or relationship, between the two variables.The larger the sample size, the more accurate the result. No matter which formula is used, this fact will stand true for all. The more data there is in putted into the formula, the more accurate the end result will be.

Outliers or anomalies must be accounted for in both correlation coefficients. Using a scatter graph is the easiest way of identifying any anomalies that may have occurred, and running the correlation analysis twice (with and without anomalies) is a great way to assess the strength of the influence of the anomalies on the analysis. If anomalies are present, Spearman’s Rank coefficient may be used instead of Pearson’s Coefficient, as this formula is extremely robust against anomalies due to the ranking system used.

Correlation ≠ Causation

While a significant relationship may be identified by correlation analysis techniques , correlation does not imply causation. The cause cannot be determined by the analysis, nor should this conclusion be attempted. The significant relationship implies that there is more to understand and that there are extraneous or underlying factors that should be explored further in order to search for a cause. While it is possible that a causal relationship exists, it would be remiss of any researcher to use the correlation results as proof of this existence.

The cause of any relationship that may be discovered through the correlation analysis, is for the researcher to determine through other means of statistical analysis, such as the coefficient of determination analysis . However, there is a great amount of value that correlation analysis can provide; for example, the value of the dependency or the variables can be estimated, which can help firms estimate the cost and sale of a product or service.

In essence, the uses for and applications of correlation-based statistical analyses allows researchers to identify which aspects and variables are dependent on each other, the result of which can generate actionable insights as they are, or starting points for further investigations and deeper insights.

Camp InsightHub

About FlexMR

We are The Insights Empowerment Company. We help research, product and marketing teams drive informed decisions with efficient, scalable & impactful insight.

About Emily James

As a professional copywriter, Emily brings our global vision to life through a broad range of industry-leading content.

Stay up to date

You might also like....

Blog Featured Image Header

10 Common Questions About InsightHu...

With a suite of impactful integrated data collection, analysis and activation tools, FlexMR’s InsightHub platform is used by many insight teams and experts for impactful insight generation and activat...

Blog Featured Image Header

10 Design Principles to Help Improv...

Surveys have been the most popular research method since the conception of market research. They are a still-flourishing method that stakeholders continually turn to as a first port of call and resear...

Blog Featured Image Header

The Best Projective Techniques for ...

Online focus groups are one of the most prominent ways to conduct qualitative research for a very good reason: they directly connect brands to customers, so they can truly understand what goes on insi...

Grit Top 50 Logo

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1 . Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1. Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

Correlation

What is correlation.

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It’s a common tool for describing simple relationships without making a statement about cause and effect.

How is correlation measured?

The sample correlation coefficient, r , quantifies the strength of the relationship. Correlations are also tested for statistical significance.

What are some limitations of correlation analysis?

Correlation can’t look at the presence or effect of other variables outside of the two being explored. Importantly, correlation doesn’t tell us about cause and effect . Correlation also cannot accurately describe curvilinear relationships.

Correlations describe data moving together

Correlations are useful for describing simple relationships among data. For example, imagine that you are looking at a dataset of campsites in a mountain park. You want to know whether there is a relationship between the elevation of the campsite (how high up the mountain it is), and the average high temperature in the summer.

For each individual campsite, you have two measures: elevation and temperature. When you compare these two variables across your sample with a correlation, you can find a linear relationship: as elevation increases, the temperature drops. They are negatively correlated .

What do correlation numbers mean?

We describe correlations with a unit-free measure called the correlation coefficient which ranges from -1 to +1 and is denoted by r . Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r =  and p = .

  • The closer r is to zero, the weaker the linear relationship.
  • Positive r values indicate a positive correlation, where the values of both variables tend to increase together.
  • Negative r values indicate a negative correlation, where the values of one variable tend to increase when the values of the other variable decrease.
  • The p-value gives us evidence that we can meaningfully conclude that the population correlation coefficient is likely different from zero, based on what we observe from the sample.
  • "Unit-free measure" means that correlations exist on their own scale: in our example, the number given for r is not on the same scale as either elevation or temperature. This is different from other summary statistics. For instance, the mean of the elevation measurements is on the same scale as its variable.

What is a p-value?

A p-value is a measure of probability used for hypothesis testing.

It indicates the likelihood of obtaining the data that we are seeing if there is no effect present — in other words, in the case of the null hypothesis. For our campsite data, this would be the hypothesis that there is no linear relationship between elevation and temperature. When a p-value is used to describe a result as statistically significant, this means that it falls below a pre-defined cutoff (e.g., p <.05 or p <.01) at which point we reject the null hypothesis in favor of an alternative hypothesis (for our campsite data, that there is a relationship between elevation and temperature).

Once we’ve obtained a significant correlation, we can also look at its strength. A perfect positive correlation has a value of 1, and a perfect negative correlation has a value of -1. But in the real world, we would never expect to see a perfect correlation unless one variable is actually a proxy measure for the other. In fact, seeing a perfect correlation number can alert you to an error in your data! For example, if you accidentally recorded distance from sea level for each campsite instead of temperature, this would correlate perfectly with elevation.

Another useful piece of information is the N, or number of observations. As with most statistical tests, knowing the size of the sample helps us judge the strength of our sample and how well it represents the population. For example, if we only measured elevation and temperature for five campsites, but the park has two thousand campsites, we’d want to add more campsites to our sample.

Visualizing correlations with scatterplots

Back to our example from above: as campsite elevation increases, temperature drops. We can look at this directly with a scatterplot. Imagine that we’ve plotted our campsite data:

  • Each point in the plot represents one campsite, which we can place on an x- and y-axis by its elevation and summertime high temperature.
  • The correlation coefficient ( r ) also illustrates our scatterplot. It tells us, in numerical terms, how close the points mapped in the scatterplot come to a linear relationship. Stronger relationships, or bigger r values, mean relationships where the points are very close to the line which we’ve fit to the data.

meaning of analysis correlation

What about more complex relationships?

Scatterplots are also useful for determining whether there is anything in our data that might disrupt an accurate correlation, such as unusual patterns like a curvilinear relationship or an extreme outlier.

Correlations can’t accurately capture curvilinear relationships. In a curvilinear relationship, variables are correlated in a given direction until a certain point, where the relationship changes.

For example, imagine that we looked at our campsite elevations and how highly campers rate each campsite, on average. Perhaps at first, elevation and campsite ranking are positively correlated, because higher campsites get better views of the park. But at a certain point, higher elevations become negatively correlated with campsite rankings, because campers feel cold at night!

meaning of analysis correlation

We can get even more insight by adding shaded density ellipses to our scatterplot. A density ellipse illustrates the densest region of the points in a scatterplot, which in turn helps us see the strength and direction of the correlation.

Density ellipses can be various sizes. One common choice for examining correlation is a 95% density ellipse, which captures approximately the densest 95% of the observations. If two variables are moving together, like our campsites’ elevation and temperature, we would expect to see this density ellipse mirror the shape of the line. And we can see that in a curvilinear relationship, the density ellipse looks round: a correlation won’t give us a meaningful description of this relationship.

meaning of analysis correlation

Join thousands of product people at Insight Out Conf on April 11. Register free.

Insights hub solutions

Analyze data

Uncover deep customer insights with fast, powerful features, store insights, curate and manage insights in one searchable platform, scale research, unlock the potential of customer insights at enterprise scale.

Featured reads

meaning of analysis correlation

Tips and tricks

How to affinity map using the canvas

meaning of analysis correlation

Product updates

Dovetail in the Details: 21 improvements to influence, transcribe, and store

meaning of analysis correlation

Customer stories

Okta securely scales customer insights across 30+ teams

Events and videos

© Dovetail Research Pty. Ltd.

What is correlation analysis?

Last updated

11 May 2023

Reviewed by

Miroslav Damyanov

Correlation analysis is a staple of data analytics. It’s a commonly used method to measure the relationship between two variables. It helps researchers understand the extent to which changes to the value in one variable are associated with changes to the value in the other. 

Correlations are often misused and misunderstood, especially in the insight industry. Below is a helpful guide to help you understand the basics and mechanics of correlation analysis. 

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • Definition of correlation analysis

Correlation analysis, also known as bivariate, is a statistical test primarily used to identify and explore linear relationships between two variables and then determine the strength and direction of that relationship. It’s mainly used to spot patterns within datasets. 

It’s worth noting that correlation doesn't equate to causation. In essence, one cannot infer a cause-and-effect relationship between the two types of data with correlation analysis. However, you can determine the relationship's size, degree, and direction. 

  • Strength of the correlation

The degree of association in correlation analysis is measured by a correlation coefficient. The Pearson correlation, which is denoted by r , is the most commonly used coefficient. The correlation coefficient quantifies the degree of linear association between two variables and can take values between -1 and +1.

No correlation: This is when the value r is zero.

Low degree: A small correlation is when r lies below ± .29

Moderate degree: If the value of the correlation coefficient is between ± 0.30 and ± 0.49, then there’s a medium correlation.

High degree: When the correlation coefficient takes a value between ±0.50 and ±1, it indicates a strong correlation.

Perfect: A perfect correlation occurs when the value of r is near ±1, indicating that as one variable increases, the other variable either increases (if positive) or decreases (if negative). 

  • Direction of the correlation

You can also identify the direction of the linear relationship between two variables by the correlation coefficient's sign. 

Positive correlation

Scores from +0.5 to +1 indicate a robust positive correlation, meaning they both increase simultaneously.

Negative correlation

Scores from -0.5 to -1 indicate a sturdy negative correlation, meaning that as a single variable increases, the other reduces proportionally. 

No correlation

If the correlation coefficient is 0, it means there’s no correlation or relationship between the two variables being analyzed. It's worth noting that increasing the sample size can lead to more precise and accurate results.

Significance of the correlation 

Once we learn about the strength and direction of the correlation, it’s critical to evaluate whether the observed correlation is likely to have occurred by chance or whether it’s a real relationship between the two variables. Therefore, we need to test the correlation for significance. The most common method for determining the significance of a correlation coefficient is by conducting a hypothesis test. 

The hypothesis test (t-test) helps us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero." We decide this based on the sample correlation coefficient ( r ) and the sample size (n). 

As with other hypothesis tests, the significance level is set first, generally at 5%. If the t-test yields a p-value below 5%, we can conclude that the correlation coefficient is significantly different from zero. Furthermore, we simply say that the correlation coefficient is "significant." Otherwise, we wouldn’t have enough evidence to conclude that there’s a true linear relationship between the two variables.

In general, the larger the correlation coefficient ( r ) and sample size (n), the more likely it is that the correlation is statistically significant. However, it's important to remember that a significant correlation doesn’t necessarily imply causation between the two variables. 

  • What factors affect a correlation analysis?

Below are the factors you must consider when arranging a correlation analysis:

Performing a correlation analysis is only appropriate if there’s evidence of a linear relationship between the quantitative variables. You can use a scatter plot to assess linearity. If you can’t draw a straight line between the points, a correlation analysis isn’t recommended.

Ensure you draw a dispersed plot since it assists in glancing and uncovering exceptions, heteroscedasticity, and non-linear relations.

Avoid analyzing correlations when information is rehashed proportions of a similar variable from a similar individual at the equivalent or changed time focus.

The existing sample size should be determined a priori. 

  • Uses of correlation analysis

Correlation analysis is primarily used to quantify the degree to which two variables relate. By using correlation analysis, researchers evaluate the correlation coefficient that tells them to what degree one variable changes when the other changes too. It provides researchers with a linear relationship between two variables. 

Correlation analysis is used by marketers to evaluate the efficiency of a marketing campaign by monitoring and analyzing customers' reactions to various marketing tactics. As such, they can better understand and serve their customers. 

Another use of correlation analysis is among data scientists and experts tasked with data monitoring. They can use correlation analysis for root cause analysis and minimize Time To Deduction (TTD) and Time To Remediation (TTR). 

Different anomalies or two unusual events happening simultaneously or at the same rate can help identify the exact cause of an issue. As a result, users incur a lower cost of experiencing the issue if they can understand and fix it soon using correlation analysis. 

  • What is the business value of correlation analysis?

Correlation analysis has numerous business values, including identifying potential inputs for more complex analyses and testing for future changes while holding other factors constant. 

Additionally, businesses can use correlation analysis to understand the relationship between two variables. This type of analysis is easy to interpret and comprehend, as it focuses on the variance of one data row in relation to another dataset.

One of the primary business values of correlation analysis is its ability to identify hidden issues within a company. For example, if there’s a positive correlation between customers looking at reviews for a particular product and whether or not they purchase it, this could indicate a place where testing can provide more information. 

By testing whether increasing the number of people who look at positive product reviews leads to an increase in purchases, businesses can develop hypotheses to improve their products and services.

Correlation analysis can also help businesses diagnose problems with multiple regression models. For instance, if a multivariate or multiple regression model isn’t producing the expected results or if independent variables are not truly independent, correlation analysis can help discover these issues.

In digital environments, correlations can be especially helpful in fueling different hypotheses that can then be rapidly tested. This is because the testing can be low risk and not require a significant investment of time or money. 

With the abundance of data available to businesses, they must be careful in selecting the variables they’ll analyze. By doing so, they can uncover previously hidden relationships between variables and gain insights that can help them make data-driven decisions. 

  • Correlation ≠ causation

As previously stated, correlation doesn't strictly imply causation, even when you identify a significant relationship by correlation analysis techniques. You can’t determine the cause by the analysis.

The significant relationship implies that there’s much more to comprehend. Additionally, it implies that there are underlying and extraneous factors that you must further explore to look for a cause. Despite the possibility of a causal relationship existing, it would be irresponsible for researchers to utilize the correlation results as proof of such existence. 

  • Example of correlation analysis

A real-life example of correlation analysis is health improvement vs. medical dose reductions. Medical researchers can use a correlation study in clinical trials to better comprehend how a newly-developed drug impacts patients. 

If a patient's health improves due to taking the drug regularly, there’s a positive correlation. Conversely, if the patient's health deteriorates or doesn't improve, there’s no correlation between the two variables (health and the drug).

What is the difference between correlation and correlation analysis?

Correlation shows us the direction and strength of a relationship between two variables. It’s expressed numerically by the correlation coefficient. Correlation analysis, on the other hand, is a statistical test that reveals the relationship between two variables/datasets.

What are correlation and regression?

Regression and correlation are the most popular methods used to examine the linear relationship between two quantitative variables. Correlation measures how strong the relationship is between a pair of variables, while regression is used to describe the relationship as an equation. 

What is the purpose of correlation?

Correlation analysis can help you to identify possible inputs for a more refined analysis. You can also use it to test for future changes while holding other things constant. The whole purpose of using correlations in research is to determine which variables are connected.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 September 2023

Last updated: 27 January 2024

Last updated: 20 January 2024

Last updated: 23 January 2024

Last updated: 5 February 2024

Last updated: 30 January 2024

Last updated: 17 January 2024

Last updated: 12 October 2023

Last updated: 31 January 2024

Last updated: 10 April 2023

Latest articles

Related topics, log in or sign up.

Get started with a free trial

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 5.

  • Example: Correlation coefficient intuition
  • Correlation coefficient intuition
  • Calculating correlation coefficient r

Correlation coefficient review

What is a correlation coefficient.

  • It always has a value between − 1 ‍   and 1 ‍   .
  • Strong positive linear relationships have values of r ‍   closer to 1 ‍   .
  • Strong negative linear relationships have values of r ‍   closer to − 1 ‍   .
  • Weaker relationships have values of r ‍   closer to 0 ‍   .

Practice problem

Want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page
  • Comprehensive Learning Paths
  • 150+ Hours of Videos
  • Complete Access to Jupyter notebooks, Datasets, References.

Correlation – Connecting the Dots, the Role of Correlation in Data Analysis

  • September 23, 2023

Correlation is a fundamental concept in statistics and data science. It quantifies the degree to which two variables are related. But what does this mean, and how can we use it to our advantage in real-world scenarios? Let’s dive deep into understanding correlation, how to measure it, and its practical implications.

In this Blog post we will learn:

  • What is Correlation?
  • Importance of Correlation in Data Science?
  • How to Measure Correlation? 3.1. Typs of Correlation 3.2. Pearson Correlation Coefficient 3.3. Formula: 3.4. Explanation: 3.5. Interpretation:
  • Calculate Correlation Using Python 4.1. Visualize Correlations 4.2. Test for Significance in Correlation 4.3. Handle Multiple Correlations 4.4. Visualizing the Correlation Matrix with a Heatmap 4.5. How to Account for Non-Linear Correlations?
  • Difference Between Correlation and Causation?

1. What is Correlation?

Correlation refers to a statistical measure that represents the strength and direction of a linear relationship between two variables. If you’ve ever wondered if one event or variable has a relationship with another, you’re thinking about correlation. For instance, does the number of hours you study correlate with your exam scores?

2. Importance of Correlation in Data Science?

Understanding correlations can help data scientists:

  • Discover relationships between variables.
  • Determine important variables for predictive modeling.
  • Uncover underlying patterns in data.
  • Make better business decisions by understanding key drivers.

3. How to Measure Correlation?

The most common measure of correlation is the Pearson correlation coefficient, often denoted as ‘r’. Its values range between -1 and 1. Here’s what these values indicate:

  • 1 or -1: Perfect correlation; 1 is positive, -1 is negative.
  • 0: No correlation.
  • Between 0 and ±1: Varying degrees of correlation, with strength increasing as it approaches ±1.

3.1. Typs of Correlation

Positive Correlation : – Value: $r$ is between 0 and +1. – Meaning: When one variable increases, the other also increases, and when one decreases, the other also decreases. – Graphically, a positive correlation will generally display a line of best fit that slopes upwards.

Negative Correlation : – Value: $r$ is between 0 and -1. – Meaning: When one variable increases, the other decreases, and vice versa. – Graphically, a negative correlation will typically show a line of best fit that slopes downwards.

No Correlation (Zero Correlation) : – Value: $r$ is approximately 0. – Meaning: Changes in one variable do not predict any particular change in the other variable. They move independently of each other. – Graphically, data with no correlation will appear scattered with no discernible pattern or trend.

3.2. Pearson Correlation Coefficient

The Pearson correlation coefficient, often denoted as $r$, quantifies the linear relationship between two variables. Let’s delve into its formula and understand its significance.

3.3. Formula:

Given two variables, $X$ and $Y$, with data points $x_1, x_2, …, x_n$ and $y_1, y_2, …, y_n$ respectively, the Pearson correlation coefficient, $r$, is formulated as:

$ r = \frac{\sum_{i=1}^{n} (x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i – \bar{x})^2 \sum_{i=1}^{n} (y_i – \bar{y})^2}} $

Where: – $\bar{x}$ represents the mean of the $x$ values. – $\bar{y}$ represents the mean of the $y$ values.

3.4. Explanation:

  • The numerator, $\sum_{i=1}^{n} (x_i – \bar{x})(y_i – \bar{y})$, sums up the product of the deviations of each data point from their respective averages. This evaluates if the deviations of one variable coincide with the deviations of the other.
  • The denominator ensures normalization of the coefficient, ensuring that $r$ remains between -1 and 1. The terms $\sum_{i=1}^{n} (x_i – \bar{x})^2$ and $\sum_{i=1}^{n} (y_i – \bar{y})^2$ sum the squared deviations of each data point from their means for $X$ and $Y$ respectively.
  • $r = 1$: Indicates a perfect positive linear relationship between $X$ and $Y$.
  • $r = -1$: Signifies a perfect negative linear relationship between $X$ and $Y$.
  • $r = 0$: Suggests no evident linear trend between the variables.

3.5. Interpretation:

Envision plotting the data points of $X$ and $Y$ on a scatter plot. The Pearson correlation provides insight into how closely these points cluster around a straight line.

  • An $r$ value near 1 implies that as $X$ elevates, $Y$ also tends to rise, resulting in an upward trending line.
  • An $r$ value nearing -1 indicates that as $X$ escalates, $Y$ generally diminishes, yielding a downward trending line.
  • A value approaching 0 indicates no discernible linear trend between the variables.

However, a crucial note is that correlation doesn’t signify causation. A strong correlation doesn’t necessarily indicate that one variable caused the other.

4. Calculate Correlation Using Python

Let’s assume you’re a teacher who wants to understand if there’s a relationship between the hours a student studies and their exam scores.

Scenario: You have data on 5 students: hours studied and their corresponding exam scores.

This output suggests a strong positive correlation between study hours and exam scores.

4.1. Visualize Correlations

A scatter plot is a common way.

4.2. Test for Significance in Correlation

It helps to determine if the observed correlation is statistically significant. This means we’re reasonably sure the correlation is real and not due to chance.

If the p-value is below a threshold (commonly 0.05), the correlation is considered statistically significant.

4.3. Handle Multiple Correlations

In real-world datasets, you might want to check correlations between multiple variables. This can be done using a correlation matrix.

4.4. Visualizing the Correlation Matrix with a Heatmap

Visualizing multiple correlations using a heatmap is a common and insightful way to quickly grasp relationships between multiple variables in a dataset. We’ll use the Python libraries like pandas and seaborn , to display these correlations.

For larger datasets, visualizing this matrix as a heatmap can be insightful.

  • annot=True ensures that the correlation values appear on the heatmap.
  • cmap specifies the color palette. In this case, we’ve chosen ‘coolwarm’, but there are various palettes available in seaborn.
  • linewidths determines the width of the lines that will divide each cell.
  • vmin and vmax are used to anchor the colormap, ensuring that the center is set at a meaningful value.

4.5. How to Account for Non-Linear Correlations?

Pearson’s correlation coefficient captures linear relationships. But what if the relationship is curved or nonlinear? Enter Spearman’s rank correlation. It’s based on ranked values rather than raw data.

5. Difference Between Correlation and Causation?

It’s vital to note that correlation does not imply causation. Just because two variables are correlated doesn’t mean one caused the other. Using our example, while hours studied and exam scores are correlated, it doesn’t mean studying longer always causes better scores. Other factors might play a role.

6. Conclusion

Correlation is a powerful tool in data science, offering insights into relationships between variables. But it’s crucial to use it judiciously and remember that correlation doesn’t equate to causation. Python, with its rich library ecosystem, provides a many tools and methods to efficiently calculate, visualize, and interpret correlations.

The key is to understand the data, choose the appropriate correlation measure, and always be aware of the underlying assumptions.

More Articles

Hypothesis testing – a deep dive into hypothesis testing, the backbone of statistical inference, sampling and sampling distributions – a comprehensive guide on sampling and sampling distributions, law of large numbers – a deep dive into the world of statistics, central limit theorem – a deep dive into central limit theorem and its significance in statistics, skewness and kurtosis – peaks and tails, understanding data through skewness and kurtosis”, similar articles, complete introduction to linear regression in r, how to implement common statistical significance tests and find the p value, logistic regression – a complete tutorial with examples in r.

Subscribe to Machine Learning Plus for high value data science content

© Machinelearningplus. All rights reserved.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free sample videos:.

Automated page speed optimizations for fast site performance

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Correlational Research | When & How to Use

Correlational Research | When & How to Use

Published on July 7, 2021 by Pritha Bhandari . Revised on June 22, 2023.

A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them.

A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

Table of contents

Correlational vs. experimental research, when to use correlational research, how to collect correlational data, how to analyze correlational data, correlation and causation, other interesting articles, frequently asked questions about correlational research.

Correlational and experimental research both use quantitative methods to investigate relationships between variables. But there are important differences in data collection methods and the types of conclusions you can draw.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Correlational research is ideal for gathering data quickly from natural settings. That helps you generalize your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships

You want to find out if there is an association between two variables, but you don’t expect to find a causal relationship between them.

Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions.

To explore causal relationships between variables

You think there is a causal relationship between two variables, but it is impractical, unethical, or too costly to conduct experimental research that manipulates one of the variables.

Correlational research can provide initial indications or additional support for theories about causal relationships.

To test new measurement tools

You have developed a new instrument for measuring your variable, and you need to test its reliability or validity .

Correlational research can be used to assess whether a tool consistently or accurately captures the concept it aims to measure.

There are many different methods you can use in correlational research. In the social and behavioral sciences, the most common data collection methods for this type of research include surveys, observations , and secondary data.

It’s important to carefully choose and plan your methods to ensure the reliability and validity of your results. You should carefully select a representative sample so that your data reflects the population you’re interested in without research bias .

In survey research , you can use questionnaires to measure your variables of interest. You can conduct surveys online, by mail, by phone, or in person.

Surveys are a quick, flexible way to collect standardized data from many participants, but it’s important to ensure that your questions are worded in an unbiased way and capture relevant insights.

Naturalistic observation

Naturalistic observation is a type of field research where you gather data about a behavior or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorizing actions and events. Naturalistic observation can include both qualitative and quantitative elements, but to assess correlation, you collect data that can be analyzed quantitatively (e.g., frequencies, durations, scales, and amounts).

Naturalistic observation lets you easily generalize your results to real world contexts, and you can study experiences that aren’t replicable in lab settings. But data analysis can be time-consuming and unpredictable, and researcher bias may skew the interpretations.

Secondary data

Instead of collecting original data, you can also use data that has already been collected for a different purpose, such as official records, polls, or previous studies.

Using secondary data is inexpensive and fast, because data collection is complete. However, the data may be unreliable, incomplete or not entirely relevant, and you have no control over the reliability or validity of the data collection procedures.

After collecting data, you can statistically analyze the relationship between variables using correlation or regression analyses, or both. You can also visualize the relationships between variables with a scatterplot.

Different types of correlation coefficients and regression analyses are appropriate for your data based on their levels of measurement and distributions .

Correlation analysis

Using a correlation analysis, you can summarize the relationship between variables into a correlation coefficient : a single number that describes the strength and direction of the relationship between variables. With this number, you’ll quantify the degree of the relationship between variables.

The Pearson product-moment correlation coefficient , also known as Pearson’s r , is commonly used for assessing a linear relationship between two quantitative variables.

Correlation coefficients are usually found for two variables at a time, but you can use a multiple correlation coefficient for three or more variables.

Regression analysis

With a regression analysis , you can predict how much a change in one variable will be associated with a change in the other variable. The result is a regression equation that describes the line on a graph of your variables.

You can use this equation to predict the value of one variable based on the given value(s) of the other variable(s). It’s best to perform a regression analysis after testing for a correlation between your variables.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

meaning of analysis correlation

It’s important to remember that correlation does not imply causation . Just because you find a correlation between two things doesn’t mean you can conclude one of them causes the other for a few reasons.

Directionality problem

If two variables are correlated, it could be because one of them is a cause and the other is an effect. But the correlational research design doesn’t allow you to infer which is which. To err on the side of caution, researchers don’t conclude causality from correlational studies.

Third variable problem

A confounding variable is a third variable that influences other variables to make them seem causally related even though they are not. Instead, there are separate causal links between the confounder and each variable.

In correlational research, there’s limited or no researcher control over extraneous variables . Even if you statistically control for some potential confounders, there may still be other hidden variables that disguise the relationship between your study variables.

Although a correlational study can’t demonstrate causation on its own, it can help you develop a causal hypothesis that’s tested in controlled experiments.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Correlational Research | When & How to Use. Scribbr. Retrieved February 15, 2024, from https://www.scribbr.com/methodology/correlational-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, what is quantitative research | definition, uses & methods, correlation vs. causation | difference, designs & examples, correlation coefficient | types, formulas & examples, what is your plagiarism score.

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

What Is a Correlation?

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

meaning of analysis correlation

 James Lacy, MLS, is a fact-checker and researcher.

meaning of analysis correlation

What Is a Correlation Coefficient?

Scatter plots and correlation, strong vs. weak correlations, correlation does not equal causation, illusory correlations, frequently asked questions.

A correlation means that there is a relationship between two or more variables. This does not imply, however, that there is necessarily a cause or effect relationship between them. Instead, it simply means that there is some type of relationship, meaning they change together at a constant rate.

A correlation coefficient is a number that expresses the strength of the relationship between the two variables.

At a Glance

Correlation can help researchers understand if there is an association between two variables of interest. Such relationships can be positive, meaning they move in the same direction together, or negative, meaning that as one goes up, the other goes down. Correlations can be visualized using scatter plots to show how measurements of a variable change along an x- and y-axis.

It is important to remember that while correlations can help show a relationship, correlation does not indicate causation.

A correlation coefficient, often expressed as r , indicates a measure of the direction and strength of a relationship between two variables. When the r value is closer to +1 or -1, it indicates that there is a stronger linear relationship between the two variables.

Correlational studies are quite common in psychology, particularly because some things are impossible to recreate or research in a lab setting .

Instead of performing an experiment , researchers may collect data to look at possible relationships between variables. From the data they collect and its analysis, researchers then make inferences and predictions about the nature of the relationships between variables.

Helpful Hint

A correlation is a statistical measurement of the relationship between two variables. Remember this handy rule: The closer the correlation is to 0, the weaker it is. The closer it is to +/-1, the stronger it is.

Types of Correlation

Correlation strength ranges from -1 to +1.

Positive Correlation

A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together. In other words, +1 is the strong positive correlation you can find.

Negative Correlation

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down.

Zero Correlation

A zero correlation suggests that the correlation statistic does not indicate a relationship between the two variables. This does not mean that there is no relationship at all; it simply means that there is not a linear relationship. A zero correlation is often indicated using the abbreviation r = 0.

Scatter plots (also called scatter charts, scattergrams, and scatter diagrams) are used to plot variables on a chart to observe the associations or relationships between them. The horizontal axis represents one variable, and the vertical axis represents the other.

Investopedia

Each point on the plot is a different measurement. From those measurements, a trend line can be calculated. The correlation coefficient is the slope of that line. When the correlation is weak ( r is close to zero), the line is hard to distinguish. When the correlation is strong ( r is close to 1), the line will be more apparent.

Correlations can be confusing, and many people equate positive with strong and negative with weak. A relationship between two variables can be negative, but that doesn't mean that the relationship isn't strong.

  • A weak positive correlation indicates that, although both variables tend to go up in response to one another, the relationship is not very strong.
  • A strong negative correlation , on the other hand, indicates a strong connection between the two variables, but that one goes up whenever the other one goes down.

For example, a correlation of -0.97 is a strong negative correlation, whereas a correlation of 0.10 indicates a weak positive correlation. A correlation of +0.10 is weaker than -0.74, and a correlation of -0.98 is stronger than +0.79.

Correlation does not equal causation. Just because two variables have a relationship does not mean that changes in one variable cause changes in the other.

Correlations tell us that there is a relationship between variables, but this does not necessarily mean that one variable causes the other to change.

An oft-cited example is the correlation between ice cream consumption and homicide rates. Studies have found a correlation between increased ice cream sales and spikes in homicides. However, eating ice cream does not cause you to commit murder. Instead, there is a third variable: heat. Both variables increase during summertime .

An illusory correlation is the perception of a relationship between two variables when only a minor relationship—or none at all—actually exists. An illusory correlation does not always mean inferring causation; it can also mean inferring a relationship between two variables when one does not exist.

For example, people sometimes assume that, because two events occurred together at one point in the past, one event must be the cause of the other. These illusory correlations can occur both in scientific investigations and in real-world situations.

Stereotypes are a good example of illusory correlations. Research has shown that people tend to assume that certain groups and traits occur together and frequently overestimate the strength of the association between the two variables.

For example, suppose someone holds the mistaken belief that all people from small towns are extremely kind. When they meet a very kind person, their immediate assumption might be that the person is from a small town, despite the fact that kindness is not related to city population.

What This Means For You

Psychology research frequently uses correlations, but it's essential to understand that correlation is not the same as causation. Confusing correlation with causation assumes a cause-effect relationship that might not exist. While correlation can help you see that there is a relationship (and tell you how strong that relationship is), only experimental research can reveal a causal connection.

You can calculate the correlation coefficient in a few different ways, with the same result. The general formula is r XY =COV XY /(S X S Y ) , which is the covariance between the two variables, divided by the product of their standard deviations:

In the cell in which you want the correlation coefficient to appear, enter =CORREL(A2:A7,B2:B7), where A2:A7 and B2:B7 are the variable lists to compare. Press Enter .

Finding the linear correlation coefficient requires a long, difficult calculation, so most people use a calculator or software such as Excel or a statistics program.

Correlations range from -1.00 to +1.00. The correlation coefficient (expressed as r ) shows the direction and strength of a relationship between two variables. The closer the r value is to +1 or -1, the stronger the linear relationship between the two variables is.

Correlations indicate a relationship between two variables, but one doesn't necessarily cause the other to change.

Mukaka M. A guide to appropriate use of correlation coefficient in medical research .  Malawi Med J . 2012;24(3):69-71.

Heath W.  Psychology Research Methods: Connecting Research to Students’ Lives . Cambridge University Press.

Chen DT. When correlation does not imply causation: Why your gut microbes may not (yet) be a silver bullet to all your problems . Harvard University.

Association for Psychological Science.  Research states that prejudice comes from a basic human need and way of thinking .

Correlation and regression . In: Swinscow TDV. Statistics at Square One . The BMJ.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Correlation Analysis

Correlation analysis

Quick definition: Correlation analysis, also known as bivariate, is primarily concerned with finding out whether a relationship exists between variables and then determining the magnitude and action of that relationship.

Key takeaways:

  • Correlation does not equal causation. Correlation analysis identities and evaluates a relationship between two variables, but a positive correlation does not automatically mean one variable affects the other.
  • The main benefits of correlation analysis are that it helps companies determine which variables they want to investigate further, and it allows for rapid hypothesis testing.
  • The main type of correlation analysis use Pearson’s r formula to identify the degree of the linear relationship between two variables.
  • Because of the amount of data available, companies must be thoughtful when deciding which variables to analyze.

The following information was provided during an interview with John Bates, director of product management for predictive marketing solutions and for Adobe Analytics Premium in Adobe Experience Cloud.

What is correlation analysis? What are the main types of correlation analysis? What is the business value of correlation analysis? How does correlation analysis help uncover company issues? What problems do companies run into when conducting correlation analysis? What is the challenge of working with similar data sets? Why is missing data a problem? What is the challenge of weak association? What is Pearson’s r formula?

What is correlation analysis?

Correlational studies are our attempts to find the extent to which two variables are related. No variables are manipulated as part of an experiment — the analyst is measuring naturally occurring events, behaviors, or characteristics.

It’s important to remember that correlation doesn't equal causation. You can’t draw any conclusions regarding the causal effect of one type of data on the other, but you can determine the size, degree, and direction of the relationship.

What are the main types of correlation analysis?

The most common types of correlation analysis fall into three main families. Pearson’s correlation coefficient is used for linearly related variables, like age and height or temperature and ice cream sales.

It requires certain assumptions about the variables: for instance, it assumes the variables are linearly connected and are normally distributed.

Spearman’s rank-order correlation, on the other hand, doesn’t carry any assumptions regarding the distribution of the data.

It's most appropriate when correlation analysis is being applied to variables that contain some kind of natural order, like the relationship between starting salary and various degrees (high school, bachelor’s, master’s, etc.), or age and income.

The third main type of correlation analysis is Kendall’s tau correlation, and it’s used in ranked pairings.

The purpose of Kendall’s tau correlation is to determine the strength of dependence between two variables. If the coefficient value is zero, the two variables X and Y can be assumed to be independent of each other.

What is the business value of correlation analysis?

Correlation analysis is useful for identifying possible inputs for a more sophisticated analysis, or for testing for future changes while holding other things constant. You may also want to just understand the relationship between two variables.

The great thing about correlation analysis is that it's fairly easy to interpret and understand, because you're only focused on the variance of one row of data in relation to the variance of another dataset.

A primary driver of business value is that it can be used to reveal hidden issues within the company.

How does correlation analysis help uncover company issues?

Correlation analysis can also be used to diagnose problems with multiple regression models. You may have some issues with a multivariate or multiple regression model, where it's not producing, or you have different independent variables that are not truly independent.

Those issues can be discovered by doing correlation analysis between the different independent variables.

Correlation analysis is also a quick way to identify potential company issues. If there is a correlation between two variables, correlation analysis provides an opportunity for rapid hypothesis testing, especially if the test is low risk and won’t require a significant investment of time and money.

For example, you might find that there’s a positive correlation between customers looking at reviews for a particular product and whether or not they purchase it.

You can't say for certain that the product reviews caused the purchase, but it indicates a place where testing can provide more information.

If you can get 10% more people to look at product reviews, especially positive ones, can you increase the number of purchases? Correlations can help to fuel different hypotheses that can then be rapidly tested, especially in digital environments.

What problems do companies run into when conducting correlation analysis?

The main problem that companies run into with correlation analysis is that many people often quickly assume that the analysis indicates causation. Only proper testing can determine whether or not you’re looking at independent and dependent variables.

One of the modern challenges of correlation analysis is, with so much data that exists, there might be similar correlations and strengthened relationships between many different variables or sets of data with another set of data.

There can be some paralysis when deciding which variable to evaluate more closely later using multivariate analysis. It isn’t always immediately clear which correlating relationship will be the most beneficial to pursue.

It is important to choose one that may be representative of others that are not truly independent.

For example, when looking at orders or purchases, there might be similar correlations between that variable and visits to a website or store, page views, and number of visitors.

Primarily, there are three main challenges many companies face when conducting correlation analysis.

What is the challenge of working with similar data sets?

One of the challenges is ensuring that your teams understand you can have multiple sets of data that correlate in a similar way because they're similar in nature.

These data sets might get collected at the same time or with the same frequency, or they may have some sort of inherent relationship. It’s important to keep that relationship in mind when looking at different variables with similar correlation outcomes.

Why is missing data a problem?

Companies can also run into problems with missing data. Let’s say you’re looking at the correlation between stock prices and sales in a specific time period.

If you suddenly have missing data for a portion of that time, or if the variables don’t line up, it can really throw off the correlation analysis itself because it will treat the missing data as zeros, even though there is a difference between the two.

To mitigate potential problems, make sure you choose a period of time for the data you're collecting, or observations that have the right distribution, that the assumptions align with the underlying data, and that you apply the proper technique.

And when there's missing data, exclude it. If you’re looking at time-based data, try to find an observation period with consistently collected data.

What is the challenge of weak association?

Another big problem that can occur is when a company assumes that because a correlation is statistically significant, it means there must be a strong association. But this is not always the case. The relationship can be statistically significant and still have a weak association.

Correlation analysis is simply testing the null hypothesis that there is no relationship. By rejecting the null hypothesis, you accept the alternative hypothesis that declares there is a relationship, but there is no information about the strength of the relationship or its importance.

Be careful about how you interpret association or correlation, because the correlation coefficient and statistical significance are two separate concepts.

What is Pearson’s r formula?

The Pearson’s r formula is the most used statistic to measure the degree of a relationship between linearly related variables. Once you run the formula, you will get a correlation report about the two tested variables.

The output is often expressed as something called the Pearson product-moment correlation coefficient, also known as r . An r value of positive one (+1) indicates a strong positive correlation, while an r value of negative one (-1) indicates a strong negative correlation. An r value of zero indicates no correlation.

There are a couple other parts of Pearson’s r formula and the correlation report. As explained before, r is another term for the coefficient that appears in your report. This coefficient usually appears alongside the degrees of freedom (df).

The degree of freedom is the number of data points you have, minus two. So, the output would report that r , within the context of the degrees of freedom, equals some correlation coefficient.

The other thing that's often reported alongside the coefficient is the p value, which indicates the statistical significance of the correlation. Another part of the correlation report is r-squared, which is called the coefficient of determination.

The coefficient of determination is, with respect to the correlation, the proportion of the variance that is shared by both variables. It gives a measure of the amount of variation that can be explained by the model or the correlation.

This value is usually written as a variable or percentage, like r-squared equals 0.36.

For the purposes of the following example, we will only focus on r, and the variables X and Y. If you want to determine the correlation between page views (X) and revenue (Y), you list all the X and Y values for a specific timeframe, and then plug those numbers into the formula in the correct places.

If the value of r is between zero and one, that indicates that as page views go up, revenue will also go up. Similarly, a value between zero and negative one would indicate that as page views go up, revenue goes down.

However, Pearson’s r formula can only tell you if there is a correlation between two variables, not whether one of the variables directly affects the other.

People also view

  • 90% Refund @Courses
  • Accountancy
  • Business Studies
  • Commercial Law
  • Organisational Behaviour
  • Human Resource Management
  • Entrepreneurship

Related Articles

  • Coding for Everyone
  • CBSE Class 11 Statistics for Economics Notes

Chapter 1: Concept of Economics and Significance of Statistics in Economics

  • Statistics for Economics | Functions, Importance, and Limitations

Chapter 2: Collection of Data

  • Data Collection & Its Methods
  • Sources of Data Collection | Primary and Secondary Sources
  • Direct Personal Investigation: Meaning, Suitability, Merits, Demerits and Precautions
  • Indirect Oral Investigation : Suitability, Merits, Demerits and Precautions
  • Difference between Direct Personal Investigation and Indirect Oral Investigation
  • Information from Local Source or Correspondents: Meaning, Suitability, Merits, and Demerits
  • Questionnaires and Schedules Method of Data Collection
  • Difference between Questionnaire and Schedule
  • Qualities of a Good Questionnaire and types of Questions
  • What are the Published Sources of Collecting Secondary Data?
  • What Precautions should be taken before using Secondary Data?
  • Two Important Sources of Secondary Data: Census of India and Reports & Publications of NSSO
  • What is National Sample Survey Organisation (NSSO)?
  • What is Census Method of Collecting Data?
  • Sample Method of Collection of Data
  • Methods of Sampling
  • Father of Indian Census
  • What makes a Sampling Data Reliable?
  • Difference between Census Method and Sampling Method of Collecting Data
  • What are Statistical Errors?

Chapter 3: Organisation of Data

  • Organization of Data
  • Objectives and Characteristics of Classification of Data
  • Classification of Data in Statistics | Meaning and Basis of Classification of Data
  • Concept of Variable and Raw Data
  • Types of Statistical Series
  • Difference between Frequency Array and Frequency Distribution
  • Types of Frequency Distribution

Chapter 4: Presentation of Data: Textual and Tabular

  • Textual Presentation of Data: Meaning, Suitability, and Drawbacks
  • Tabular Presentation of Data: Meaning, Objectives, Features and Merits
  • Different Types of Tables
  • Classification and Tabulation of Data

Chapter 5: Diagrammatic Presentation of Data

  • Diagrammatic Presentation of Data: Meaning , Features, Guidelines, Advantages and Disadvantages
  • Types of Diagrams
  • Bar Graph | Meaning, Types, and Examples
  • Pie Diagrams | Meaning, Example and Steps to Construct
  • Histogram | Meaning, Example, Types and Steps to Draw
  • Frequency Polygon | Meaning, Steps to Draw and Examples
  • Ogive (Cumulative Frequency Curve) and its Types
  • What is Arithmetic Line-Graph or Time-Series Graph?
  • Diagrammatic and Graphic Presentation of Data

Chapter 6: Measures of Central Tendency: Arithmetic Mean

  • Measures of Central Tendency in Statistics
  • Arithmetic Mean: Meaning, Example, Types, Merits, and Demerits
  • What is Simple Arithmetic Mean?
  • Calculation of Mean in Individual Series | Formula of Mean
  • Calculation of Mean in Discrete Series | Formula of Mean
  • Calculation of Mean in Continuous Series | Formula of Mean
  • Calculation of Arithmetic Mean in Special Cases
  • Weighted Arithmetic Mean

Chapter 7: Measures of Central Tendency: Median and Mode

  • Median(Measures of Central Tendency): Meaning, Formula, Merits, Demerits, and Examples
  • Calculation of Median for Different Types of Statistical Series
  • Calculation of Median in Individual Series | Formula of Median
  • Calculation of Median in Discrete Series | Formula of Median
  • Calculation of Median in Continuous Series | Formula of Median
  • Graphical determination of Median
  • Mode: Meaning, Formula, Merits, Demerits, and Examples
  • Calculation of Mode in Individual Series | Formula of Mode
  • Calculation of Mode in Discrete Series | Formula of Mode
  • Grouping Method of Calculating Mode in Discrete Series | Formula of Mode
  • Calculation of Mode in Continuous Series | Formula of Mode
  • Calculation of Mode in Special Cases
  • Calculation of Mode by Graphical Method
  • Mean, Median and Mode| Comparison, Relationship and Calculation

Chapter 8: Measures of Dispersion

  • Measures of Dispersion | Meaning, Absolute and Relative Measures of Dispersion
  • Range | Meaning, Coefficient of Range, Merits and Demerits, Calculation of Range
  • Calculation of Range and Coefficient of Range
  • Interquartile Range and Quartile Deviation
  • Partition Value | Quartiles, Deciles and Percentiles
  • Quartile Deviation and Coefficient of Quartile Deviation: Meaning, Formula, Calculation, and Examples
  • Calculation of Mean Deviation for different types of Statistical Series
  • Mean Deviation from Mean | Individual, Discrete, and Continuous Series
  • Standard Deviation: Meaning, Coefficient of Standard Deviation, Merits, and Demerits
  • Standard Deviation in Individual Series
  • Methods of Calculating Standard Deviation in Discrete Series
  • Methods of calculation of Standard Deviation in frequency distribution series
  • Combined Standard Deviation: Meaning, Formula, and Example
  • How to calculate Variance?
  • Coefficient of Variation: Meaning, Formula and Examples
  • Lorenz Curveb : Meaning, Construction, and Application

Chapter 9: Correlation

Correlation: meaning, significance, types and degree of correlation.

  • Methods of measurements of Correlation
  • Calculation of Correlation with Scattered Diagram
  • Spearman's Rank Correlation Coefficient
  • Karl Pearson's Coefficient of Correlation
  • Karl Pearson's Coefficient of Correlation | Methods and Examples

Chapter 10: Index Number

  • Index Number | Meaning, Characteristics, Uses and Limitations
  • Methods of Construction of Index Number
  • Unweighted or Simple Index Numbers: Meaning and Methods
  • Methods of calculating Weighted Index Numbers
  • Fisher's Index Number as an Ideal Method
  • Fisher's Method of calculating Weighted Index Number
  • Paasche's Method of calculating Weighted Index Number
  • Laspeyre's Method of calculating Weighted Index Number
  • Laspeyre's, Paasche's, and Fisher's Methods of Calculating Index Number
  • Consumer Price Index (CPI) or Cost of Living Index Number: Construction of Consumer Price Index|Difficulties and Uses of Consumer Price Index
  • Methods of Constructing Consumer Price Index (CPI)
  • Wholesale Price Index (WPI) | Meaning, Uses, Merits, and Demerits
  • Index Number of Industrial Production: Meaning, Characteristics, Construction, and Example
  • Inflation and Index Number

Important Formulas in Statistics for Economics

  • Important Formulas in Statistics for Economics | Class 11

The previous statistical approaches (such as central tendency and dispersion) are limited to analysing a single variable or statistical analysis. This type of statistical analysis in which one variable is involved is known as Univariate Distribution . However, there are instances in real-world situations where distributions have two variables like data related to income and expenditure, prices and demand, height and weight, etc. The distribution with two variables is referred to as Bivariate Distribution . It is necessary to uncover relationships between two or more statistical series. Correlation is a statistical technique for determining the relationship between two variables.

Table of Content

What is correlation, correlation and causation, significance of correlation, types of correlation, degree of correlation.

A statistical tool that helps in the study of the relationship between two variables is known as Correlation. It also helps in understanding the economic behaviour of the variables.

According to L.R. Connor, “If two or more quantities vary in sympathy so that movements in one tend to be accompanied by corresponding movements in others, then they are said to be correlated.” In the words of Croxton and Cowden, “When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation.” According to A.M. Tuttle, “Correlation is an analysis of covariation between two or more variables.”
Two Variables are said to be Correlated if: The two variables are said to be correlated if a change in one causes a corresponding change in the other variable. For example, A change in the price of a commodity leads to a change in the quantity demanded. An increase in employment levels increases the output. When income increases, consumption increases as well.  The degree of correlation between various statistical series is the main subject of analysis in such circumstances.

The degree of correlation between two or more variables can be determined using correlation. However, it does not consider the cause-and-effect relationship between variables. If two variables are correlated, it could be for any of the following reasons:

1. Third-Party Influence:

The influence of a third party can result in a high degree of correlation between the two variables. This analysis does not take into account third-party influence. For example, the correlation between the yield per acre of grain and jute can be of a high degree because both are linked to the amount of rainfall. However, in reality, both these variables do not have any effect on each other.

2. Mutual Dependence (Cause and Effect):

It may be challenging to determine which is the cause, and which is the effect when two variables indicate a high degree of correlation. It is so because they may be having an impact on one another. For example, when there is an increase in the price of a commodity, it increases its demand. Here, the price is the cause, and demand is the effect. However, there is a possibility that the price of the commodity will rise due to increased demand (population growth or other factors). In that case, increased demand is the cause, and the price is the effect.

3. Pure Chance: 

It is possible that the correlation between the two variables was obtained by random chance or coincidence alone. This correlation is also known as spurious . Therefore, it is crucial to determine whether there is a possibility of a relationship between the variables under analysis. For example, even if there is no relationship between the two variables (between the income of people in a society and their clothes size), one may see a strong correlation between them.

So, it can be said that correlation provides only a quantitative measure and does not indicates cause and effect relationship between the variables. For that reason, it must be ensured that variables are correctly selected for the correlation analysis.

  • It helps determine the degree of correlation between the two variables in a single figure.
  • It makes understanding of economic behaviour easier and identifies critical variables that are significant. 
  • When two variables are correlated, the value of one variable can be estimated using the value of the other. This is performed with the regression coefficients.
  • In the business world, correlation helps in taking decisions. The correlation helps in making predictions which helps in reducing uncertainty. It is so because the predictions based on correlation are probably reliable and close to reality.

Types of Correlation

Correlation can be classified based on various categories:  

Based on the direction of change in the value of two variables, correlation can be classified as:

1. positive correlation:.

When two variables move in the same direction; i.e., when one increases the other also increases and vice-versa, then such a relation is called a Positive Correlation. For example, Relationship between the price and supply, income and expenditure, height and weight, etc.

Positive Correlation

 2. Negative Correlation:

When two variables move in opposite directions; i.e., when one increases the other decreases, and vice-versa, then such a relation is called a Negative Correlation. For example, the relationship between the price and demand, temperature and sale of woollen garments, etc.

Negative Correlation

Based on the ratio of variations between the variables, correlation can be classified as:

1. linear correlation: .

When there is a constant change in the amount of one variable due to a change in another variable, it is known as Linear Correlation. This term is used when two variables change in the same ratio. If two variables that change in a fixed proportion are displayed on graph paper, a straight- line will be used to represent the relationship between them. As a result, it suggests a linear relationship.

Linear Correlation

In the above graph, for every change in the variable X by 5 units there is a change of 10 units in variable Y. The ratio of change of variables X and Y in the above schedule is 1:2 and it remains the same, thus there is a linear relationship between the variables.

2. Non-Linear (Curvilinear) Correlation:

When there is no constant change in the amount of one variable due to a change in another variable, it is known as a Non-Linear Correlation. This term is used when two variables do not change in the same ratio. This shows that it does not form a straight-line relationship. For example , the production of grains would not necessarily increase even if the use of fertilizers is doubled.

Non-Linear Correlation

In the above schedule, there is no specific relationship between the variables. Even though both change in the same direction i.e. both are increasing, they change in different proportions. The ratio of change of variables X and Y in the above schedule is not the same, thus there is a non-linear relationship between the variables.

Based on the number of variables involved, correlation can be classified as:

1. simple correlation:.

Simple correlation implies the study between the two variables only. For example, the relationship between price and demand, and the relationship between price and money supply.

2. Partial Correlation:

Partial correlation implies the study between the two variables keeping other variables constant. For example, the production of wheat depends upon various factors like rainfall, quality of manure, seeds, etc. But, if one studies the relationship between wheat and the quality of seeds, keeping rainfall and manure constant, then it is a partial correlation.

3. Multiple Correlation:

Multiple correlation implies the study between three or more three variables simultaneously. The entire set of independent and dependent variables is studied simultaneously. For example, the relationship between wheat output with the quality of seeds and rainfall.

The degree of correlation is measured through the coefficient of correlation. The degree of correlation for the given variables can be expressed in the following ways:

1. Perfect Correlation:

If the relationship between the two variables is in such a way that it varies in equal proportion (increase or decrease) it is said to be perfectly correlated. This can be of two types:

  • Positive Correlation: When the proportional change in two variables is in the same direction, it is said to be positively correlated. In this case, the Coefficient of Correlation is shown as +1 .
  • Negative Correlation: When the proportional change in two variables is in the opposite direction, it is said to be negatively correlated. In this case, the Coefficient of Correlation is shown as -1 .

2. Zero Correlation:

If there is no relation between two series or variables, it is said to have zero or no correlation. It means that if one variable changes and it does not have any impact on the other variable, then there is a lack of correlation between them. In such cases, the Coefficient of Correlation will be 0.

3. Limited Degree of Correlation:

There is a situation with a limited degree of correlation between perfect and absence of correlation. In real life, it was found that there is a limited degree of correlation. 

  • The coefficient of correlation, in this case, lies between +1 and -1.
  • Correlation is limited negative when there are unequal changes in the opposite direction.
  • Correlation is limited and positive when there are unequal changes in the same direction.
  • The degree of correlation can be low (when the coefficient of correlation lies between 0 and 0.25), moderate (when the coefficient of correlation lies between 0.25 and 0.75), or high (when the coefficient of correlation lies between 0.75 and 1).

Within these limits, the value of correlation can be interpreted as:

Degree of Correlation

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now !

Looking for a place to share your ideas, learn, and connect? Our Community portal is just the spot! Come join us and see what all the buzz is about!

Please Login to comment...

  • Statistics for Economics
  • Top 12 AI Testing Tools for Test Automation in 2024
  • 7 Best ChatGPT Plugins for Converting PDF to Editable Formats
  • Microsoft is bringing Linux's Sudo command to Windows 11
  • 10 Best AI Voice Cloning Tools to be Used in 2024 [Free + Paid]
  • 10 Best IPTV Service Provider Subscriptions

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Search Search Please fill out this field.

What Is Correlation?

What correlation can tell you.

  • Calculation
  • Portfolio Diversification

Special Considerations

  • Limitations
  • Correlation FAQs
  • Corporate Finance
  • Financial Analysis

Correlation: What It Means in Finance and the Formula for Calculating It

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

meaning of analysis correlation

Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. Correlations are used in advanced portfolio management , computed as the correlation coefficient , which has a value that must fall between -1.0 and +1.0.

Key Takeaways

  • Correlation is a statistic that measures the degree to which two variables move in relation to each other.
  • In finance, the correlation can measure the movement of a stock with that of a benchmark index, such as the S&P 500.
  • Correlation is closely tied to diversification, the concept that certain types of risk can be mitigated by investing in assets that are not correlated.
  • Correlation measures association, but doesn't show if x causes y or vice versa—or if the association is caused by a third factor.
  • Correlation may be easiest to identify using a scatterplot, especially if the variables have a non-linear yet still strong correlation.

Investopedia / Sydney Saporito

Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient. The correlation coefficient's values range between -1.0 and 1.0.

A perfect positive correlation means that the correlation coefficient is exactly 1. This implies that as one security moves, either up or down, the other security moves in lockstep, in the same direction. A perfect negative correlation means that two assets move in opposite directions, while a zero correlation implies no linear relationship at all.

For example, large-cap mutual funds generally have a high positive correlation to the Standard and Poor's (S&P) 500 Index or nearly one. Small-cap stocks tend to have a positive correlation to the S&P, but it's not as high or approximately 0.8.

However, put option prices and their underlying stock prices will tend to have a negative correlation. A put option gives the owner the right but not the obligation to sell a specific amount of an  underlying security  at a pre-determined price within a specified time frame.

Put option contracts become more profitable when the underlying stock price decreases. In other words, as the stock price increases, the put option prices go down, which is a direct and high-magnitude negative correlation.

How to Calculate Correlation

There are several methods of calculating correlation. The most common method, the Pearson product-moment correlation, is discussed further in this article. The Pearson product-moment correlation measures the linear relationship between two variables. It can be used for any data set that has a finite covariance matrix. Here are the steps to calculate correlation.

  • Gather data for your "x-variable" and "y variable.
  • Find the mean for the x-variable and find the mean for the y-variable.
  • Subtract the mean of the x-variable from each value of the x-variable. Repeat this step for the y-variable.
  • Multiply each difference between the x-variable mean and x-variable value by the corresponding difference related to the y-variable.
  • Square each of these differences and add the results.
  • Determine the square root of the value obtained in Step 5.
  • Divide the value in Step 4 by the value obtained in Step 6.

To avoid the complex manual calculation, consider using the CORREL function in Excel.

Formula for Correlation

Using the Pearson product-moment correlation method, the following formula can be used to find the correlation coefficient, r:

r = n × ( ∑ ( X , Y ) − ( ∑ ( X ) × ∑ ( Y ) ) ) ( n × ∑ ( X 2 ) − ∑ ( X ) 2 ) × ( n × ∑ ( Y 2 ) − ∑ ( Y ) 2 ) where: r = Correlation coefficient n = Number of observations \begin{aligned}&r = \frac { n \times ( \sum (X, Y) - ( \sum (X) \times \sum (Y) ) ) }{ \sqrt { ( n \times \sum (X ^ 2) - \sum (X) ^ 2 ) \times ( n \times \sum( Y ^ 2 ) - \sum (Y) ^ 2 ) } } \\&\textbf{where:}\\&r=\text{Correlation coefficient}\\&n=\text{Number of observations}\end{aligned} ​ r = ( n × ∑ ( X 2 ) − ∑ ( X ) 2 ) × ( n × ∑ ( Y 2 ) − ∑ ( Y ) 2 ) ​ n × ( ∑ ( X , Y ) − ( ∑ ( X ) × ∑ ( Y ))) ​ where: r = Correlation coefficient n = Number of observations ​

Example of Correlation

Investment managers, traders, and analysts find it very important to calculate correlation because the risk reduction benefits of diversification rely on this statistic. Financial spreadsheets and software can calculate the value of correlation quickly.

As a hypothetical example, assume that an analyst needs to calculate the correlation for the following two data sets:

X: (41, 19, 23, 40, 55, 57, 33)

Y: (94, 60, 74, 71, 82, 76, 61)

There are three steps involved in finding the correlation. The first is to add up all the X values to find SUM(X), add up all the Y values to fund SUM(Y) and multiply each X value with its corresponding Y value and sum them to find SUM(X,Y):

SUM(X) = (41 + 19 + 23 + 40 + 55 + 57 + 33) = 268

SUM(Y) = (94 + 60 + 74 + 71 + 82 + 76 + 61) = 518

SUM(X,Y) = (41 x 94) + (19 x 60) + (23 x 74) + ... (33 x 61) = 20,391

The next step is to take each X value, square it, and sum up all these values to find SUM(x^2). The same must be done for the Y values:

SUM(X^2) = (41^2) + (19^2) + (23^2) + ... (33^2) = 11,534

SUM(Y^2) = (94^2) + (60^2) + (74^2) + ... (61^2) = 39,174

Noting that there are seven observations, n, the following formula can be used to find the correlation coefficient, r:

r = n × ( ∑ ( X , Y ) − ( ∑ ( X ) × ∑ ( Y ) ) ) ( n × ∑ ( X 2 ) − ∑ ( X ) 2 ) × ( n × ∑ ( Y 2 ) − ∑ ( Y ) 2 ) where: r = Correlation coefficient n = Number of observations \begin{aligned}&r = \frac { n \times ( \sum (X, Y) - ( \sum (X) \times \sum (Y) ) ) }{ \sqrt { ( n \times \sum (X ^ 2) - \sum (X) ^ 2 ) \times ( n \times \sum( Y ^ 2 ) - \sum (Y) ^ 2 ) } } \\&\textbf{where:}\\&r=\text{Correlation coefficient}\\&n=\text{Number of observations}\end{aligned} ​ r = ( n × ∑ ( X 2 ) − ∑ ( X ) 2 ) × ( n × ∑ ( Y 2 ) − ∑ ( Y ) 2 ) ​ n × ( ∑ ( X , Y ) − ( ∑ ( X ) × ∑ ( Y ) ) ) ​ where: r = Correlation coefficient n = Number of observations ​

In this example, the correlation would be:

r = (7 x 20,391 - (268 x 518) / SquareRoot((7 x 11,534 - 268^2) x (7 x 39,174 - 518^2)) = 3,913 / 7,248.4 = 0.54

Correlation and Portfolio Diversification

In investing, correlation is most important in relation to a diversified portfolio. Investors who wish to mitigate risk can do so by investing in non-correlated assets. For example, consider an investor who owns airline stock. If the airline industry is found to have a low correlation to the social media industry, the investor may choose to invest in a social media stock understanding that an negative impact to one industry may not impact the other.

This is often the approach when considering investing across asset classes. Stocks, bonds, precious metals, real estate, cryptocurrency, commodities, and other types of investments each have different relationships to each other. While some may be heavily correlated, others may act as a hedge to diversify risk if they are not correlated.

Risk that can be diversified away is called unsystematic risk. This type of risk is specific to a company, industry, or asset class. Investing in different assets can reduce your portfolio's correlation and reduce your exposure to unsystematic risk.

Correlation is often dictated and related to other statistical considerations. It is common to see correlation cited when statistics is used to analyze variables.

In statistics, a p-value is used to indicate whether the findings are statistically significant. It is possible to determine that two variables are correlated, but there may not be enough supporting evidence to state this as a strong claim. A high p-value indicates there is enough evidence to meaningfully conclude that the population correlation coefficient is different from zero.

Scatterplots

The easiest way to visualize whether two variables are correlated is to graphically depict them using a scatterplot. Each point on a scatterplot represents one sample item. The x-axis of the scatterplot represents one of the variables being tested, while the y-axis of the scatter plot represents the other.

The correlation coefficient of the two variables is depicted graphically often as a linear line mapped to show the relationship of the two variables. If the two variables are positively correlated, an increasing linear line may be drawn on the scatterplot. If two variables are negatively correlated, a decreasing linear line may be draw. The stronger the relationship of the data points, the closer each data point will be to this line.

Scatterplots may be more useful when analyzing more complex data that might have changing relationships. For example, two variables may be positively correlated to a certain point, then their relationship becomes negatively correlated. This non-linear relationship may be more difficult to identify using formulas but can be easier to spot when graphed on a scatterplot.

Last, scatterplots can easily depict correlation when they incorporate density shading. A density shade or density ellipse is a shaded area on a scatterplot that visually shows the densest region of data points on a scatterplot. The density ellipses will often mirror the direction of a linear correlation line if variables are related. Otherwise, density ellipses that are more circular with no defined direction indicate lower correlation.

Another inherent difficulty in statistics is determining whether relationships between two variables are caused by those variables. Consider the following statement:

"Most basketball players are tall. Therefore if you play basketball, you will become tall."

It's clear that the statement above is not true. Individuals who are tall and understand this advantage may gravitate to basketball because their natural physical abilities best suit them for the sport. However, because height and activity in basketball may be positively correlated, statisticians and data scientists must be aware that a strong relationship between two variables may or may be caused due to any one of the variables.

Limitations of Correlation

Like other aspects of statistical analysis, correlation can be misinterpreted. Small sample sizes may yield unreliable results, even if it appears as though correlation between two variables is strong. Alternatively, a small sample size may yield uncorrelated findings when the two variables are in fact linked.

Correlation is often skewed when an outlier is present. Correlation only shows how one variable is connected to another and may not clearly identify how a single instance or outcome can impact the correlation coefficient.

Correlation may also be misinterpreted if the relationship between two variables is nonlinear. It is much easier to identify two variables with a positive or negative correlation. However, two variables may still be correlated with a more complex relationship.

Correlation is a statistical term describing the degree to which two variables move in coordination with one another. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation.

Why Are Correlations Important in Finance?

Correlations play an important role in finance because they are used to forecast future trends and to manage the risks within a portfolio. These days, the correlations between assets can be easily calculated using various software programs and online services. Correlations, along with other statistical concepts, play an important role in the creation and pricing of derivatives and other complex financial instruments.

What Is an Example of How Correlation Is Used?

Correlation is a widely-used concept in modern finance. For example, a trader might use historical correlations to predict whether a company’s shares will rise or fall in response to a change in interest rates or commodity prices. Similarly, a portfolio manager might aim to reduce their risk by ensuring that the individual assets within their portfolio are not overly correlated with one another.

Is High Correlation Better?

Investors may have a preference on the level of correlation within their portfolio. In general, most investors will prefer to have a lower correlation as this mitigates risk in their portfolios of different assets or securities being impacted by similar market conditions. However, risk-seeking investors or investors wanting to put their money into a very specific type of sector or company may be willing to have higher correlation within their portfolio in exchange for greater potential returns.

meaning of analysis correlation

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices
  • More from M-W
  • To save this word, you'll need to log in. Log In

correlation

Definition of correlation

Examples of correlation in a sentence.

These examples are programmatically compiled from various online sources to illustrate current usage of the word 'correlation.' Any opinions expressed in the examples do not represent those of Merriam-Webster or its editors. Send us feedback about these examples.

Word History

Medieval Latin correlation-, correlatio , from Latin com- + relation-, relatio relation

1561, in the meaning defined at sense 1

Phrases Containing correlation

correlation coefficient

  • rank correlation
  • coefficient of correlation

Dictionary Entries Near correlation

Cite this entry.

“Correlation.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/correlation. Accessed 17 Feb. 2024.

Kids Definition

Kids definition of correlation, more from merriam-webster on correlation.

Thesaurus: All synonyms and antonyms for correlation

Nglish: Translation of correlation for Spanish Speakers

Britannica English: Translation of correlation for Arabic Speakers

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

8 grammar terms you used to know, but forgot, homophones, homographs, and homonyms, commonly misspelled words, a guide to em dashes, en dashes, and hyphens, absent letters that are heard anyway, popular in wordplay, the words of the week - feb. 16, 8 uncommon words related to love, 9 superb owl words, 'gaslighting,' 'woke,' 'democracy,' and other top lookups, 10 words for lesser-known games and sports, games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

ORIGINAL RESEARCH article

The causal correlation between gut microbiota abundance and pathogenesis of cervical cancer: a bidirectional mendelian randomization study.

Hua Yang

  • Department of Gynecology, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China

Background: Observational studies and animal experiments suggested potential relevance between gut microbiota (GM) and cervical cancer (CC), but the relevance of this association remains to be clarified.

Methods: We performed a two-sample bidirectional Mendelian randomization (MR) analysis to explore whether there was a causal correlation between GM and CC, and the direction of causality.

Results: In primary outcomes, we found that a higher abundance of class Clostridia, family Family XI, genus Alloprevotella, genus Ruminiclostridium 9, and order Clostridiales predicted higher risk of CC, and a higher abundance of class Lentisphaeria, family Acidaminococcaceae, genus Christensenellaceae R7 group, genus Marvinbryantia, order Victivallales, phylum Actinobacteria, and phylum Lentisphaerae predicted lower risk of CC. During verifiable outcomes, we found that a higher abundance of class Methanobacteria, family Actinomycetaceae, family Methanobacteriaceae, genus Lachnospiraceae UCG 010, genus Methanobrevibacter, order Actinomycetales, and order Methanobacteriales predicted a higher risk of CC, and a higher abundance of family Streptococcaceae, genus Dialister, and phylum Bacteroidetes predicted a lower risk of CC, and vice versa.

Conclusion: Our study implied a mutual causality between GM and CC, which provided a novel concept for the occurrence and development of CC, and might promote future functional or clinical analysis.

1 Introduction

The incidence of cervical cancer (CC) is only lower than that of breast, lung, and colorectal cancers in women worldwide. It has the highest incidence of malignant tumors in the female reproductive tract. Nearly 530,000 new CCs are diagnosed worldwide each year, causing serious health and economic burdens in both developing and developed countries ( He and Li, 2021 ). The etiological effect of high-risk human papillomavirus (hr-HPV) has been established for decades and it has been found in over 99.7% of women with CC. However, hr-HPV infection is very common in sexually active women, and the incidence of CC is relatively low. Over 90% of hr-HPV infections regress naturally ( Viveros-Carreño et al., 2023 ). Infection with hr-HPV is essential but insufficient for the pathogenesis of CC, and additional factors, such as immune factors and vaginal microecology, may play a role in the occurrence and progression of CC.

CC is regarded as a multifactorial disease, the mechanism and process of carcinogenesis are largely unknown and may involve several environmental, lifestyle, and hereditary factors, such as sexual behavior, parity, use of hormonal contraceptives, individual immunity, and smoking ( Zhang et al., 2021 ). The human digestive tract carries over 1,000 microbes defined as the gut microbiota (GM). It is essential for health and plays a role in multiple physiological processes, including metabolism, detoxification, nutrient absorption, maintenance of homeostasis of the intestinal mucous barrier, and the immune and endocrine systems. Furthermore, the alteration of the GM, such as altered composition and abundance, may cause damage to the mucosal barrier, translocation of bacteria and endotoxins, may cause a variety of inflammation, might compromise the immune environment, may change the metabolome, and so on. Recent research has found that alterations in the GM were closely associated with a variety of tumors in and outside the gut tract, such as liver cancer, ovarian cancer, colorectal cancer, pancreatic cancer, and breast cancer ( Yu and Schwabe, 2017 ; Ma et al., 2018 ; Chen et al., 2019 ; Parida and Sharma, 2019 ; Plaza-Díaz et al., 2019 ).

Several studies have reported a potential association between CC and GM. Karpinets et al. (2020) did not identify any relationship between GM alteration caused by oral antibiotics and CC development. Wang et al. (2019) compared the GM profiles in eight CC patients with five healthy controls and found an increased alpha diversity and clear separation in beta diversity in CC-associated gut microbiota. Kang et al. (2020) compared the fecal microbiota of 17 early CC patients to that of 29 healthy women and found a significant difference in Chao1 diversity between the two groups, with consistent outcomes in observed operational taxonomic unit analysis, and a prediction model based on fecal analysis was helpful for early diagnosis of CC. Chang et al. (2023) compared the microbiota profiles in fecal samples of 13 CC patients with 10 healthy controls and found a significant difference in GM abundance between women with CC and healthy controls. They also found that the abundance of Ruminococcus 2 was negatively correlated with the CC stage. Sims et al. (2019) compared the microbiota profiles in fecal samples of 42 CC with 46 healthy controls, found increased alpha diversity and beta diversity in the women with CC and the abundance of Dialister, Prevotella, and Porphyromonas were significantly higher in the women with CC, while the abundance of Lachnospiracea, Bacteroides , and Alistipes was significantly higher in healthy women. Although these studies suggest that the gut microbiota is correlated with CC, the real effect and impact on CC are largely unknown. The causal relationship between gut microbiota and CC has been insufficiently addressed due to the limitations of conventional observational studies, which are susceptible to potential confounding bias or reverse causal bias.

Mendelian randomization (MR) analysis is an epidemiological statistical method that can overcome the limitations of traditional observational studies and may avoid the bias of confounding factors or reverse causality ( Bowden and Holmes, 2019 ) because it adopts germline randomly assigned single nucleotide polymorphisms (SNPs) to compute the causal correlation degree between exposures and outcomes. To explore the causal role and direction of the causal correlation between gut microbiota abundance and CC, we employed a two-sample bidirectional MR analysis during the present study.

2 Materials and methods

2.1 gwas (genome-wide association studies) statistics of cc.

This study enrolled two public CC datasets. The first, a GWAS (ID: ukb-b-8777; Elsworth et al., 2013 ; (Cancer code, self-reported: cervical cancer)), enrolled 1,889 CC patients and 461,044 non-gender-specific health controls from the European population. The second GWAS (ID: ieu-b-4876; Hemani et al., 2018 ) enrolled 563 CC patients, and the 198,523 non-sex-specific health controls came from the European population.

2.2 GWAS statistics of gut microbiota abundance

The GWAS data from a study (ID: ebi-a-GCST90017108) on GM abundance ( Wang et al., 2018 ) was published in 2021 and included a 14,306 sample size from the European population. The data were coordinated with gene sequencing profiles based on 16S ribosomal RNA, and a total of 197 taxa (9 phyla, 16 classes, 19 orders, 33 families, and 120 genera) were included, and 14 unknown taxa (11 genera and 3 families) were excluded.

2.3 Instrumental variable selection

GM abundance was comprehensively analyzed in distinct independent taxa. To ensure the robustness and veracity of the analysis results, the following optimization strategies were adopted to extract closely related independent variables (IVs). First, we set a strong statistical threshold of p  < 5 × 10 −8 to extract SNPs intensively correlated with GM abundance, which was regarded as a conventional genome-wide significance level. Unfortunately, no SNPs were extracted from most taxa of the gut microbiota. We used the second threshold of p  < 5 × 10 −6 for the MR analysis, which was based on previous literature. Second, we set the threshold for the minor allele frequency (MAF) to 0.01 to filter common spontaneous SNP mutations. Third, the key rules of the MR analysis excluded the bias caused by linkage disequilibrium (LD) among IVs ( Kurilshikov et al., 2021 ). In the present study, we set R 2  < 0.001 and clumping distance = 10,000 kb as a threshold to clump SNPs with LD. Fourth, to ensure that the effectiveness of the SNPs on exposure corresponded to the same allele on outcome, we clamped palindromic SNPs to avoid the substitution of strand directionality or allele coding.

The horizontal pleiotropy of the SNPs was tested using MR-PRESSO ( Li et al., 2022 ). The MR-PRESSO outlier trial was used to compute the value of p for single significant pleiotropy, whereas the global trial was used to compute the value of p for overall significant pleiotropy. SNPs were arranged in increasing order of p -values and then removed one by one. The MR-PRESSO global trial was adopted to compute the value of p for the remaining SNPs again until p  > 0.05. The remaining SNPs were used for subsequent MR analysis.

2.4 Mendelian randomization statistical analysis

A bidirectional two-sample MR was used to infer the causal correlation between GM abundance and CC. To test whether GM abundance was causally affected by CC, we selected SNPs that were closely related to GM abundance. GWAS data: ukb-b-8777 was set as the primary outcome and GWAS data: ieu-b-4876 was set as the verified outcome. To test whether CC altered GM abundance, SNPs closely related to CC were selected as exposure in the reverse MR analysis process, with GM abundance as the outcome.

Three mainstream MR methods were adopted for multiple SNPs MR analysis: inverse-variance weighted (IVW), weighted median estimator (WME), and MR-Egger regression ( Bowden et al., 2015 ). The IVW method was regarded as more robust than the WME and MR-Egger regressions. Therefore, the MR results mainly depended on the IVW method. The Wald ratio test was used to evaluate the association between the gut microbiota taxa and CC for only one SNP.

Several sensitivity tests were conducted to assess the reliability of the results. Leave-one-out test ( Gnona and Stewart, 2022 ) was used to assess whether the causal correlation was caused by a single SNP. A causal direction test was used to compare the variance caused by the SNPs in the exposure to the outcome. If the SNPs caused greater variance in exposure than in the outcome, causality was known as directional robustness. F-statistics were calculated to avoid a weak IV bias ( Cheng et al., 2017 ). F -values < 10 were defined as weak IVs and were excluded from the subsequent MR analysis.

All analyses were performed using R for Windows, version 4.3.0. The “TwoSampleMR” package and “MR-PRESSO” package were adopted.

2.5 Heterogeneity

Cochran’s Q statistic was used for the heterogeneity test ( Burgess and Thompson, 2011 ). A Q value > number of SNPs −1 or a value of p < 0.05 suggested heterogeneous and invalid IVs.

The flowchart of the present MR analysis is presented in Figure 1 .

www.frontiersin.org

Figure 1 . The flowchart of the present MR analysis.

3.1 SNP selection

First, we extracted 1 to 11 SNPs associated with single GM taxa for a total of 183 taxa (8 phyla, 16 classes, 20 orders, 29 families, and 110 genera) at a significance threshold of p  < 5 × 10 −6 according to the aforementioned optimization strategies. The number of SNPs in each taxon is detailed in Supplementary Table S1 . The F -value of each SNP was greater than 10, indicating that no weak IVs basis existed, and no pleiotropic effects were identified by the MR-PRESSO global trial ( p  > 0.05).

3.2 Primary causality of GM abundance on the risk of CC

When setting the statistical threshold as p  < 5 × 10 −6 and GWAS data: ukb-b-8777 as the outcome, we found that a higher abundance of class Clostridia causally predicted a higher risk of CC (b = 0.00382, p  = 0.01526 by IVW test), with homogenous results by MR Egger and weighted median test, no horizontal polymorphism ( p  = 0.385), and heterogeneity ( p  = 0.4014) were found between SNPs. Causal direction analysis found that the variance explained in exposure was significantly stronger than the variance explained in outcome ( p  = 4.34 e-36), and the leave-one-out test found that causality was not affected by a single SNP. The method comparison of the MR results is plotted in Figure 2A , which suggests that the causal correlation between class Clostridia and CC was robust. We also found that a higher abundance of family Family XI, genus Alloprevotella, genus Ruminiclostridium 9, and order Clostridiales causally predicted a higher risk of CC ( Supplementary Table S2 ; Figures 2B – E ). Meanwhile, we found that a higher abundance of family Acidaminococcaceae causally predicted a lower risk of CC (b = −0.002515, p  = 0.03575 by IVW test), with homogenous results by MR Egger and weighted median test, no horizontal polymorphism ( p  = 0.819), and heterogeneity ( p  = 0.4002) found between SNPs. Causal direction analysis found that the variance explained in exposure was significantly stronger than the variance explained in outcome ( p  = 1.4e-05), and the leave-one-out test found that causality was not affected by a single SNP. The method comparison of MR results is plotted in Figure 2F , which suggests that the causal correlation between family Acidaminococcaceae and CC was robust. We also found a higher abundance of class Lentisphaeria, genus Christensenellaceae R7 group, genus Marvinbryantia, order Victivallales, phylum Actinobacteria, and phylum Lentisphaerae ( Supplementary Table S2 ; Figures 2G – L ).

www.frontiersin.org

Figure 2 . Causality of GM abundance on the risk of CC from GWAS data: ukb-b-8777. (A) the method comparison of MR results between class Clostridia and CC. (B) The method comparison of MR results between family Family XI and CC. (C) The method comparison of MR results between genus Alloprevotella and CC. (D) The method comparison of MR results between genus Ruminiclostridium 9 and CC. (E) The method comparison of MR results between order Clostridiales and CC. (F) The method comparison of MR results between family Acidaminococcaceae and CC. (G) the method comparison of MR results between class Lentisphaeria and CC. (H) The method comparison of MR results between genus Christensenellaceae R7 group and CC. (I) The method comparison of MR results between genus Marvinbryantia and CC. (J) the method comparison of MR results between order Victivallales and CC. (K) the method comparison of MR results between phylum Actinobacteria and CC. (L) the method comparison of MR results between phylum Lentisphaerae and CC.

3.3 Verified causality of GM abundance on the risk of CC

When setting the statistical threshold as p  < 5 × 10 −6 and GWAS data: ieu-b-4876 as the outcome, we found that a higher abundance of class Methanobacteria causally predicted a higher risk of CC (b = 0.001967, p  = 0.01526 by IVW test), with homogenous results by MR Egger and weighted median test. No horizontal polymorphism ( p  = 0.673), and heterogeneity ( p  = 0.9709) were found between SNPs, with insufficient data for causal direction analysis, and the leave-one-out test found that causality was not affected by a single SNP. Causal direction analysis found that the variance explained in exposure was significantly stronger than the variance explained in the outcome ( p  = 2.79e-18). The method comparison of MR results was plotted in Figure 3A , which suggested that the causal correlation between class Methanobacteria and CC was robust. We also found that a higher abundance of family Actinomycetaceae, family Methanobacteriaceae, genus Lachnospiraceae UCG 010, genus Methanobrevibacter, order Actinomycetales, and order Methanobacteriales causally predicted a higher risk of CC ( Supplementary Table S3 ; Figures 3B – G ). Meanwhile, we found that a higher abundance of family Streptococcaceae causally predicted a lower risk of CC (b = −0.003037, p  = 0.001172 by IVW test), with homogenous results by MR Egger and weighted median test, no horizontal polymorphism ( p  = 0.574), and heterogeneity ( p  = 0.9043) between SNPs. The causal direction analysis found that the variance explained in the exposure was significantly stronger than that in the outcome ( p  = 1.27e-40). The leave-one-out test revealed that causality was not affected by a single SNP. The method comparison of the MR results is plotted in Figure 4H , which suggests that the causal correlation between family Streptococcaceae and CC was robust. We also found that a higher abundance of genus Dialister and phylum Bacteroidetes causally predicted a lower risk of CC ( Supplementary Table S3 ; Figures 3I , J ).

www.frontiersin.org

Figure 3 . Causality of GM abundance on the risk of CC from GWAS data: ieu-b-4876. (A) The method comparison of MR results between class Methanobacteria and CC. (B) The method comparison of MR results between family Actinomycetaceae and CC. (C) The method comparison of MR results between family Methanobacteriaceae and CC. (D) The method comparison of MR results between genus Lachnospiraceae UCG 010 and CC. (E) The method comparison of MR results between genus Methanobrevibacter and CC. (F) The method comparison of MR results between order Actinomycetales and CC. (G) The method comparison of MR results between order Methanobacteriales and CC. (H) The method comparison of MR results between family Streptococcaceae and CC. (I) The method comparison of MR results between genus Dialister and CC. (J) The method comparison of MR results between phylum Bacteroidetes and CC.

www.frontiersin.org

Figure 4 . Causality of CC from GWAS data: ukb-b-8777 on GM abundance. (A) The method comparison of MR results between CC and class Verrucomicrobiae . (B) The method comparison of MR results between CC and family Verrucomicrobiaceae . (C) The method comparison of MR results between CC and genus Akkermansia . (D) The method comparison of MR results between CC and genus Ruminiclostridium5 . (E) The method comparison of MR results between CC and order Verrucomicrobiales . (F) The method comparison of MR results between CC and phylum Verrucomicrobia . (G) The method comparison of MR results between CC and family Defluviitaleaceae . (H) The method comparison of MR results between CC and genus Barnesiella . (I) The method comparison of MR results between CC and genus Defluviitaleaceae UCG011 . (J) The method comparison of MR results between CC and genus Lachnospiraceae UCG001 .

3.4 Primary causality of CC on GM abundance

When the statistical threshold was set at p  < 5 × 10 −6 , seven closely related SNPs were extracted as IVs for GWAS data: ukb-b-8777 gut microbiota taxa as the outcome. We found that CC causally predicted a higher abundance of class Verrucomicrobiae (b = 17.12, p  = 0.03145 by IVW test), with homogenous results by the MR Egger and weighted median test. No horizontal polymorphism ( p  = 0.612), and heterogeneity ( p  = 0.8866) were found between SNPs. The leave-one-out test revealed that causality was not affected by a single SNP. A comparison of the MR results is plotted in Figure 4A , which suggests that the causal correlation between CC and class Verrucomicrobiae was robust. However, causal direction analysis found that the variance explained in exposure was not significantly different from the outcome ( p  = 0.928), which meant that the reverse causal relationship could not be excluded. We also found that CC causally predicted a higher abundance of family Verrucomicrobiaceae, genus Akkermansia, genus Ruminiclostridium5, order Verrucomicrobiales, and phylum Verrucomicrobia ( Supplementary Table S4 ; Figures 4B – F ). Furthermore, we found that CC causally predicted a lower abundance of family Defluviitaleaceae (b = −20.5, p  = 0.03323 by IVW test), with homogenous results by MR Egger and weighted median test, no horizontal polymorphism ( p  = 0.469), and heterogeneity ( p  = 0.843) between SNPs, and the leave-one-out analysis found that the causality was not affected by a single SNP. The method comparison of MR results is plotted in Figure 4G , which suggests that the causal association between CC and family Defluviitaleaceae was robust; however, the causal direction analysis found that the variance explained in exposure was insignificantly different from the outcome ( p  = 0.605), which meant that the reverse causal relationship could not be excluded. We also found that CC causally predicted a lower abundance of genera Barnesiella, Defluviitaleaceae UCG011, and Lachnospiraceae UCG001 ( Supplementary Table S4 ; Figures 4H – J ).

3.5 Verified causality of CC on GM abundance

When the statistical threshold was set as p  < 5 × 10 −6 , 14 closely related SNPs were extracted as IVs for GWAS data: ieu-b-4876, GM taxa as the outcome. We found that CC causally predicted a lower abundance of genus Alloprevotella (b = −0.1618, p  = −0.1618 by IVW test), with homogenous results by the MR Egger and weighted median test. No horizontal polymorphism ( p  = 0.526), and heterogeneity ( p  = 0.1685) were found between SNPs, and the leave-one-out test found that causality was not affected by a single SNP. The method comparison of MR results is plotted in Supplementary Figure S1A . These results suggested that the causal correlation between CC and genus Alloprevotella was robust. However, there were not enough SNPs for causal direction analysis, so a reverse causal relationship could not be excluded. We also found that CC causally predicted a lower abundance of genus Eubacteriumnodatum group and genus Phascolarctobacterium ( Supplementary Table S5 ; Supplementary Figures S1B,C ). Furthermore, we found that CC causally predicted a higher abundance of genus Eisenbergiella (b = 22.22, p  = 0.03636 by IVW test), with homogenous results by MR Egger and Weighted median test. No horizontal polymorphism ( p  = 0.94), and heterogeneity ( p  = 0.9326) were found between SNPs, and the leave-one-out analysis found that the causality was not affected by a single SNP. The method comparison of MR results is plotted in Supplementary Figure S1D , which suggests that the causal association between CC and genus Eisenbergiella was robust. However, the causal direction test found that the variance explained in the exposure was not significantly different from the outcome ( p  = 0.382), which meant that the reverse causal relationship could not be excluded. We also found that CC causally predicted a higher abundance of phylum Euryarchaeota ( Supplementary Table S5 ; Supplementary Figure S1E ).

4 Discussion

To the best of our knowledge, the present study was an ingenious MR study to explore the causal correlation between GM abundance and CC. We thought it had important clinical practice guiding significance for microbiome and CC studies. Robustly associated SNPs were extracted from the largest GWAS for GM abundance and two independent CC databases. According to the comprehensive genetic correlation analysis of over 670,000 European individuals, we found that SNPs’ predisposition to some GM taxa had a causal relationship with CC; furthermore, we also found that SNPs’ predisposition to CC had a causal relationship with some GM taxa. These results have implications for a novel direction for non-invasive early diagnosis of CC; further, the GM might be a novel target for prevention, treatment, and long-term management of CC.

CC is the most common gynecological neoplasia in developing countries and poses a serious health threat to women worldwide. Although hr-HPV infection is a direct etiological factor of CC, the mechanism of carcinogenesis is largely unknown. Nearly 85%–90% of hr-HPV infections are spontaneously resolved, and only 10%–15% remain, which might cause precancerous neoplasia and further progress to CC. In the past few years, owing to the rapid development of science technologies, omics research, bioinformatics, and high-throughput sequencing technology, a growing body of research has found that the vaginal micro-ecosystem plays a key role in the progression from hr-HPV infection to CC ( Sanderson et al., 2019 ; Łaniewski et al., 2020 ; Castanheira et al., 2021 ; Kyrgiou and Moscicki, 2022 ). Given the close relationship between the endovaginal and gut microbiome through bacterial movement and colonization between the genital and gastrointestinal tracts, we speculated that GM might be involved in the carcinogenesis of CC. Indeed, several studies have found a potential association between CC and GM, but the results were inconsistent as to whether there was a causal correlation, and the direction of the causal correlation between GM and CC was still unclear.

In this MR study, dual verification was adopted to verify the robustness of causality. For the primary analysis, we used GWAS data:ukb-b-8777 as the outcome and found genetic liability to class Clostridia, class Lentisphaeria, family Acidaminococcaceae, genus Butyricicoccus, Family XI, genus Alloprevotellagenus, genus Christensenellaceae R7 group, genus Marvinbryantia, genus Ruminiclostridium 9, order Clostridiales, order Victivallales, phylum Actinobacteria, and phylum Lentisphaerae was causally associated with CC. For verifiable analysis, we set GWAS data: lieu-b-4876 as the outcome and found genetic liability to class Methanobacteria, family Actinomycetaceae, family Methanobacteriaceae, family Streptococcaceae, genus Dialister, genus Lachnospiraceae UCG 010, genus Methanobrevibacter, order Methanobacteriales, and phylum Bacteroidetes was causally associated with CC. Our results suggest that certain gut microbiota taxa might be involved in CC pathogenesis, and GM analysis might help identify females at high risk for CC and might help in the early diagnosis of CC at an earlier time.

Until now, the mechanism by which GM affects CC has been largely unknown. One hypothesis was that GM might activate Toll-like receptors (TLRs; Chang et al., 2020 ) and pro-inflammatory cytokines such as interleukin-17 (IL-17; Brevi et al., 2020 ) and tumor necrosis factor (TNF; Piñero et al., 2019 ), ultimately leading to carcinogenesis. Indeed, in vivo and in vitro studies have shown that TLRs are key mediators in bacteria-triggered cancer. Lipopolysaccharide (LPS) derived from GM could induce hepatocellular carcinoma by activating TLR4 in immune cells ( Liu et al., 2022 ). Similarly, Ochi et al. (2012) found that TLR4 was a key factor mediating carcinogenesis from pancreatic inflammatory disease to pancreatic cancer. Several studies in recent decades have found obvious alterations in the GM composition of women with CC; for example, Wang et al. (2019) found that the abundance of Bacteroidetes was significantly upregulated in fecal specimens of women with CC, which was confirmed by our Mendelian randomization study.

Microbial enterovaginal transfer might be another potential mechanism. Persistent infection with hr-HPV and CC are directly linked to abnormal vaginal microbiota. Alterations in the vaginal microbiome affect the risk of human papillomavirus (HPV) infection and persistence, further affecting CC risk. The sharing between genital and gastrointestinal tracts was confirmed by clinical observation and experimental studies and might be mediated by motility and colonization of the vagina from fecal pellets. Transfer is regarded as the main way to improve the diversity of the vaginal microbiome. Karpinets et al. (2020) found that antibiotics increased the abundance of the vaginal microbiome, further lowering the risk of CC in a murine model. Ritu et al. (2019) explored the relationship between cervical microbiota abundance and HPV infection in healthy women and found that genus Dialister was closely related to poor HPV status, including newly acquired and persistent infection. Our study also found that the genetic liability to genus Dialister was causally associated with CC. These coupled results suggest the rational pathogenicity of genus Dialister for CC.

Although HPV could be considered as the primary cause of almost all CC, growing evidence suggests that the prevalence of HPV-negative patients is not negligible, which might be the result of alternative pathways such as the TP53—related pathway, nuclear factor kappa B (NF-kB) pathway, reactive oxygen species (ROS), or free radicals during vaginal microenvironment ( Giuliano, 2003 ), which are closely related with the vaginal and gut microbiota. These might be the possible mechanisms that mediate the causal relationship between GM and CC.

In vivo studies by Chung and Lambert (2009) suggested that estrogen might be involved in the carcinogenesis of CC. Epidemiological data ( Chung et al., 2010 ; Bronowicka-Kłys et al., 2016 ; Yu et al., 2018 ) also confirmed that women with the highest serum estrogen levels had an increased risk of CC. Therefore, estrogen might be another potential mechanism mediated by the gut microbiota affecting CC. Previous research has shown that alterations in the gut microbiota might lead to increased circulatory estrogen levels. Certain taxa gut microbiota could produce β-glucuronidase or β-glucosidases involved in estrogen metabolism, which is defined as the “estrobolome” ( Baker et al., 2017 ; Ervin et al., 2019 ; Hu et al., 2023 ). Estrogen metabolism mainly occurs in the liver. The liver can inactivate estrogen through sex hormone-binding globulin. β-glucuronidase or β-glucosidases originate from the gut microbiota and catalyze the degradation of conjugated estrogen, and the reabsorption of estrogen from the intestine increases. High-throughput sequencing of the gut microbial genome revealed that multiple bacterial taxa carried the gene coded for β-glucuronidase or β-glucosidases, including Bacteroides, Bifidobacterium, Escherichia, and Lactobacillus. Indeed, during our MR study, we found genetic liability to phylum Bacteroidetes (belonging to the estrobolome) was causally associated with CC, suggesting that GM might be involved in the pathogenesis of CC through estrogen metabolism.

Although numerous clinical studies have reported that GM abundance was significantly different between CC and healthy females, the results were inconsistent. Whether the gut microbiota abundance changed before and after the onset of CC in the same female has not yet been clarified. Whether CC could cause alterations in gut microbiota abundance was unknown, which seemed to be difficult to solve by epidemiological or observational studies. Therefore, we adopted a reverse MR study to clarify this puzzle.

During the reverse MR study, we set GWAS data: ukb-b-8777 as exposure, and MR results found genetic liability to CC was causally associated with class Verrucomicrobiae, family Defluviitaleaceae, family Verrucomicrobiaceae, genus Akkermansia, genus Barnesiella, genus Defluviitaleaceae UCG011, genus Lachnospiraceae UCG001, genus Ruminiclostridium5, order Verrucomicrobiales, and order Verrucomicrobiales . For verifiable analysis, we set GWAS data: lieu-b-4876 as exposure, and MR results found genetic liability to CC was causally associated with genera Alloprevotella, Eisenbergiella, Eubacterium nodatum, Phascolarctobacterium, and Euryarchaeota. Our results suggested that CC might affect certain GM taxa, which means that GM analysis might be a novel target for non-invasive diagnosis of CC. However, the exact process by which CC affects GM is largely unknown, which is a crucial implication for further research.

The main strengths of our research included the ingenious MR analysis of the causality between GM abundance and CC, enrollment of the largest sample sizes until now, and dual verification to verify the robustness of the results. MR analysis eliminated the confounding bias that is inevitable in epidemiological studies, which had similar levels of evidence as randomized controlled trials (RCT). Moreover, our SNPs were strongly associated with GM and were compared using two dependent CC databases. Moreover, the sensitivity analysis showed no pleiotropy or heterogeneity, indicating that our results were statistically robust.

Nevertheless, our study had several limitations. First, the populations in the GWAS data used in our study were European, and, as ethnic and geographical factors might affect GM abundance, this might lead to limited effects extending to other populations. Second, owing to the summary data, individual characteristics were unavailable, and the confounding bias of individualized features was inestimable. Third, due to our strict thresholds, many of the genetic liabilities of GM abundance were excluded at the IV selection stage, which might have resulted in missing some meaningful results.

Future research should enroll larger samples from multiple races and geographic areas to explore more robust causality. Furthermore, more in-depth mechanistic research is urgently needed, and the diagnostic and therapeutic value of targeting GM abundance in CC requires further research.

5 Conclusion

We comprehensively assessed the relationship between GM abundance and CC. Our results suggested that 22 GM taxa were causally related to CC, while CC was causally related to 15 GM taxa. Our study implies a mutual causality between GM abundance and the pathogenesis of CC, which provides a novel concept for the occurrence and development of CC and might promote future functional or clinical analysis.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary material , further inquiries can be directed to the corresponding author.

Author contributions

HY: Conceptualization, Data curation, Investigation, Methodology, Software, Supervision, Writing – original draft, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by the Medical Science and Technology Research Foundation of Guangdong Province (A2022317).

Acknowledgments

The author thanks the participants of all GWAS cohorts included in the present work and the investigators of the IEU Open GWAS project, MiBioGen, and the UK Biobank for sharing the GWAS summary statistics.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2024.1336101/full#supplementary-material

Baker, J. M., Al-Nakkash, L., and Herbst-Kralovetz, M. M. (2017). Estrogen-gut microbiome axis: physiological and clinical implications. Maturitas 103, 45–53. doi: 10.1016/j.maturitas.2017.06.025

PubMed Abstract | Crossref Full Text | Google Scholar

Bowden, J., Davey Smith, G., and Burgess, S. (2015). Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int. J. Epidemiol. 44, 512–525. doi: 10.1093/ije/dyv080

Bowden, J., and Holmes, M. V. (2019). Meta-analysis and Mendelian randomization: a review. Res. Synth. Methods 10, 486–496. doi: 10.1002/jrsm.1346

Brevi, A., Cogrossi, L. L., Grazia, G., Masciovecchio, D., Impellizzieri, D., Lacanfora, L., et al. (2020). Much more than IL-17A: cytokines of the IL-17 family between microbiota and Cancer. Front. Immunol. 11:565470. doi: 10.3389/fimmu.2020.565470

Bronowicka-Kłys, D. E., Lianeri, M., and Jagodziński, P. P. (2016). The role and impact of estrogens and xenoestrogen on the development of cervical cancer. Biomed. Pharmacother. 84, 1945–1953. doi: 10.1016/j.biopha.2016.11.007

Burgess, S., and Thompson, S. G. (2011). CRP CHD genetics collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int. J. Epidemiol. 40, 755–764. doi: 10.1093/ije/dyr036

Crossref Full Text | Google Scholar

Castanheira, C. P., Sallas, M. L., Nunes, R. A. L., Lorenzi, N. P. C., and Termini, L. (2021). Microbiome and cervical Cancer. Pathobiology 88, 187–197. doi: 10.1159/000511477

Chang, C. W., Lee, H. C., Li, L. H., Chiang Chiau, J. S., Wang, T. E., Chuang, W. H., et al. (2020). Fecal microbiota transplantation prevents intestinal injury, upregulation of toll-like receptors, and 5-fluorouracil/Oxaliplatin-induced toxicity in colorectal Cancer. Int. J. Mol. Sci. 21:386. doi: 10.3390/ijms21020386

Chang, L., Qiu, L., Lei, N., Zhou, J., Guo, R., Gao, F., et al. (2023). Characterization of fecal microbiota in cervical cancer patients associated with tumor stage and prognosis. Front. Cell. Infect. Microbiol. 13:1145950. doi: 10.3389/fcimb.2023.1145950

Chen, J., Douglass, J., Prasath, V., Neace, M., Atrchian, S., Manjili, M. H., et al. (2019). The microbiome and breast cancer: a review. Breast Cancer Res. Treat. 178, 493–496. doi: 10.1007/s10549-019-05407-5

Cheng, H., Garrick, D. J., and Fernando, R. L. (2017). Efficient strategies for leave-one-out cross-validation for genomic best linear unbiased prediction. J Anim Sci Biotechnol. 8:38. doi: 10.1186/s40104-017-0164-6

Chung, S. H., Franceschi, S., and Lambert, P. F. (2010). Estrogen and ERalpha: culprits in cervical cancer? Trends Endocrinol. Metab. 21, 504–511. doi: 10.1016/j.tem.2010.03.005

Chung, S. H., and Lambert, P. F. (2009). Prevention and treatment of cervical cancer in mice using estrogen receptor antagonists. Proc. Natl. Acad. Sci. U.S.A. 106, 19467–19472. doi: 10.1073/pnas.0911436106

Elsworth, B., Jones, M., and Blaxter, M. (2013). Badger--an accessible genome exploration environment. Bioinformatics 29, 2788–2789. doi: 10.1093/bioinformatics/btt466

Ervin, S. M., Li, H., Lim, L., Roberts, L. R., Liang, X., Mani, S., et al. (2019). Gut microbial β-glucuronidases reactivate estrogens as components of the astrobleme that reactivate estrogens. J. Biol. Chem. 294, 18586–18599. doi: 10.1074/jbc.RA119.010950

Giuliano, A. (2003). Cervical carcinogenesis: the role of co-factors and generation of reactive oxygen species. Salud Publica Mex. 45, S354–S360. doi: 10.1590/S0036-36342003000900009

Gnona, K. M., and Stewart, W. C. L. (2022). Revisiting the Wald test in small case-control studies with a skewed covariate. Am. J. Epidemiol. 191, 1508–1518. doi: 10.1093/aje/kwac058

He, W. Q., and Li, C. (2021). Recent global burden of cervical cancer incidence and mortality, predictors, and temporal trends. Gynecol. Oncol. 163, 583–592. doi: 10.1016/j.ygyno.2021.10.075

Hemani, G., Zheng, J., Elsworth, B., Wade, K. H., Haberland, V., Baird, D., et al. (2018). The MR-base platform supports systematic causal inference across the human phenomenon. Elife 7:e34408. doi: 10.7554/eLife.34408

Hu, S., Ding, Q., Zhang, W., Kang, M., Ma, J., and Zhao, L. (2023). Gut microbial beta-glucuronidase: a vital regulator in female estrogen metabolism. Gut Microbes 15:2236749. doi: 10.1080/19490976.2023.2236749

Kang, G. U., Jung, D. R., Lee, Y. H., Jeon, S. Y., Han, H. S., Chong, G. O., et al. (2020). Dynamics of fecal microbiota with and without invasive cervical Cancer and its application in early diagnosis. Cancers 12:3800. doi: 10.3390/cancers12123800

Karpinets, T. V., Solley, T. N., Mikkelson, M. D., Dorta-Estremera, S., Nookala, S. S., Medrano, A. Y. D., et al. (2020). Effect of antibiotics on gut and vaginal microbiomes associated with cervical Cancer development in mice. Cancer Prev. Res. 13, 997–1006. doi: 10.1158/1940-6207.CAPR-20-0103

Kurilshikov, A., Medina-Gomez, C., Bacigalupe, R., Radjabzadeh, D., Wang, J., Demirkan, A., et al. (2021). Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165. doi: 10.1038/s41588-020-00763-1

Kyrgiou, M., and Moscicki, A. B. (2022). Vaginal microbiome and cervical cancer. Semin. Cancer Biol. 86, 189–198. doi: 10.1016/j.semcancer.2022.03.005

Łaniewski, P., Ilhan, Z. E., and Herbst-Kralovetz, M. M. (2020). The microbiome and gynecological cancer development, prevention and therapy. Nat. Rev. Urol. 17, 232–250. doi: 10.1038/s41585-020-0286-z

Li, P., Wang, H., Guo, L., Gou, X., Chen, G., Lin, D., et al. (2022). Association between gut microbiota and preeclampsia-eclampsia: a two-sample Mendelian randomization study. BMC Med. 20:443. doi: 10.1186/s12916-022-02657-x

Liu, Y., Zhang, X., Chen, S., Wang, J., Yu, S., Li, Y., et al. (2022). Gut-derived lipopolysaccharide promotes alcoholic hepatosteatosis and subsequent hepatocellular carcinoma by stimulating neutrophil extracellular traps through toll-like receptor 4. Clin. Mol. Hepatol. 28, 522–539. doi: 10.3350/cmh.2022.0039

Ma, C., Han, M., Heinrich, B., Fu, Q., Zhang, Q., Sandhu, M., et al. (2018). Gut microbiome-mediated bile acid metabolism regulates liver cancer via NKT cells. Science 360. doi: 10.1126/science.aan5931

Ochi, A., Nguyen, A. H., Bedrosian, A. S., Mushlin, H. M., Zarbakhsh, S., Barilla, R., et al. (2012). MyD88 inhibition amplifies dendritic cell capacity to promote pancreatic carcinogenesis via Th2 cells. J. Exp. Med. 209, 1671–1687. doi: 10.1084/jem.20111706

Parida, S., and Sharma, D. (2019). The microbiome-estrogen connection and breast Cancer risk. Cell 8:1642. doi: 10.3390/cells8121642

Piñero, F., Vazquez, M., Baré, P., Rohr, C., Mendizabal, M., Sciara, M., et al. (2019). A different gut microbiome linked to inflammation found in cirrhotic patients with and without hepatocellular carcinoma. Ann. Hepatol. 18, 480–487. doi: 10.1016/j.aohep.2018.10.003

Plaza-Díaz, J., Álvarez-Mercado, A. I., Ruiz-Marín, C. M., Reina-Pérez, I., Pérez-Alonso, A. J., Sánchez-Andujar, M. B., et al. (2019). Association of breast and gut microbiota dysbiosis and the risk of breast cancer: a case-control clinical study. BMC Cancer 19:495. doi: 10.1186/s12885-019-5660-y

Ritu, W., Enqi, W., Zheng, S., Wang, J., Ling, Y., and Wang, Y. (2019). Evaluation of the associations between cervical microbiota and HPV infection, clearance, and persistence in Cytologically Normal women. Cancer Prev. Res. 12, 43–56. doi: 10.1158/1940-6207.CAPR-18-0233

Sanderson, E., Davey Smith, G., Windmeijer, F., and Bowden, J. (2019). An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48, 713–727. doi: 10.1093/ije/dyy262

Sims, T. T., Colbert, L. E., Zheng, J., Delgado Medrano, A. Y., Hoffman, K. L., Ramondetta, L., et al. (2019). Gut microbial diversity and genus-level differences identified in cervical cancer patients versus healthy controls. Gynecol. Oncol. 155, 237–244. doi: 10.1016/j.ygyno.2019.09.002

Viveros-Carreño, D., Fernandes, A., and Pareja, R. (2023). Updates on cervical cancer prevention. Int. J. Gynecol. Cancer 33, 394–402. doi: 10.1136/ijgc-2022-003703

Wang, J., Kurilshikov, A., Radjabzadeh, D., Turpin, W., Croitoru, K., Bonder, M. J., et al. (2018). Meta-analysis of human genome-microbiome association studies: the MiBioGen consortium initiative. Microbiome. 6:101. doi: 10.1186/s40168-018-0479-3

Wang, Z., Wang, Q., Zhao, J., Gong, L., Zhang, Y., Wang, X., et al. (2019). Altered diversity and composition of the gut microbiome in patients with cervical cancer. AMB Express 9:40. doi: 10.1186/s13568-019-0763-z

Yu, L. X., and Schwabe, R. F. (2017). The gut microbiome and liver cancer: mechanisms and clinical translation. Nat. Rev. Gastroenterol. Hepatol. 14, 527–539. doi: 10.1038/nrgastro.2017.72

Yu, P., Wang, Y., Li, C., Lv, L., and Wang, J. (2018). Protective effects of downregulating estrogen receptor alpha expression in cervical Cancer. Anticancer Agents Med Chem. 18, 1975–1982. doi: 10.2174/1871520618666180830162517

Zhang, X., Coker, O. O., Chu, E. S., Fu, K., Lau, H. C. H., Wang, Y. X., et al. (2021). Dietary cholesterol drives fatty liver-associated liver cancer by modulating gut microbiota and metabolites. Gut 70, 761–774. doi: 10.1136/gutjnl-2019-319664

Keywords: cervical cancer, causal relationship, genome-wide association studies, gut microbiota, Mendelian randomization

Citation: Yang H (2024) The causal correlation between gut microbiota abundance and pathogenesis of cervical cancer: a bidirectional mendelian randomization study. Front. Microbiol . 15:1336101. doi: 10.3389/fmicb.2024.1336101

Received: 10 November 2023; Accepted: 26 January 2024; Published: 14 February 2024.

Reviewed by:

Copyright © 2024 Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hua Yang, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Development of correlation for the power coefficient of the hybrid hydrokinetic turbine rotor having straight-bladed Darrieus and helical-bladed Savonius rotors

  • Technical Paper
  • Published: 17 February 2024
  • Volume 46 , article number  136 , ( 2024 )

Cite this article

  • Md. Mustafa Kamal   ORCID: orcid.org/0000-0003-2020-0846 1 ,
  • S. K. Singal 2 &
  • Ali Abbas 1  

A hybrid rotor can self-start on its own with better performance than a single hydrokinetic turbine rotor. This makes the hybrid rotor employable to tap the available potential of flow in rivers and canals. Given that, an extensive numerical investigation has been carried out to enhance the performance of the hybrid hydrokinetic turbine rotor. The influence of Savonius helical-bladed angle, radius ratio and attachment angle on the performance characteristics of the hybrid rotor has been studied under different operating conditions. Based on the computed power coefficient for the hybrid rotor, a correlation has been developed for the power coefficient with different system and operating parameters. The values of the power coefficient obtained from the developed correlation and numerical analysis are compared and found that 95% of data points lie within ± 14% which shows the good agreement of predicted values with numerical values. The value of the regression coefficient ( R 2 ) for developed correlation is obtained as 0.97. Moreover, the mean absolute deviation value in the predicted power coefficient is obtained as 6.2%. The nomograms have also been developed based on correlation to design a prototype of a hybrid hydrokinetic turbine under different water flow velocities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

meaning of analysis correlation

Abbreviations

Two-dimensional

Three-dimensional

Computational fluid dynamics

Finite volume method

Hydrokinetic turbine

Reynolds number

Re-normalization group

Revolution per minute

Savonius helical blade angle

Tip–speed ratio

Rotor frontal area (mm 2 )

Darrieus rotor chord length (mm)

Average coefficient of power

Rotor diameter (mm)

Darrieus rotor diameter (mm)

Savonius rotor diameter (mm)

Rotating zone diameter (mm)

Rotor height (mm)

Darrieus rotor height (mm)

Savonius rotor height (mm)

Rotating zone height (mm)

Channel height (depth of water) (mm)

Length of channel (mm)

Turbulent kinetic energy (m 2 /s 2 )

Radius ratio

RPM of rotor

Number of Darrieus blades

Number of Savonius blades

Pressure (N/m 2 )

Available power

Shaft power

Water velocity (m/s)

Velocity vector (m/s)

Coordinate system (m)

Angular velocity of rotor (rad/s)

Fluid density (kg/m 3 )

Dynamic viscosity (kg/ms)

Turbulent viscosity (kg/ms)

Kronecker delta

Rate of dissipation (m 2 /s 3 )

Attachment angle (°)

Savonius helical blade angle (°)

Statistical Review of World Energy globally consistent data on world energy markets . and authoritative publications in the field of energy, (2021).

Gielen D, Boshell F, Saygin D, Bazilian MD, Wagner N, Gorini R (2019) The role of renewable energy in the global energy transformation. Energy Strateg Rev 24:38–50. https://doi.org/10.1016/j.esr.2019.01.006

Article   Google Scholar  

Sood M, Singal SK (2019) Development of hydrokinetic energy technology: a review. Int J Energy Res 43:5552–5571. https://doi.org/10.1002/er.4529

िही POLICIES AND PUBLICATIONS, n.d.

Kamal M, Saini G, Abbas A, Prasad V (2021) Prediction and analysis of the cavitating performance of a Francis turbine under different loads. Energy Sources Part A Recover Util Environ Eff 00:1–25. https://doi.org/10.1080/15567036.2021.2009941

Article   CAS   Google Scholar  

Güney MS, Kaygusuz K (2010) Hydrokinetic energy conversion systems: a technology status review. Renew Sustain Energy Rev 14:2996–3004. https://doi.org/10.1016/j.rser.2010.06.016

Gohil PP, Saini RP (2014) Coalesced effect of cavitation and silt erosion in hydro turbines—a review. Renew Sustain Energy Rev 33:280–289. https://doi.org/10.1016/j.rser.2014.01.075

Kaunda CS, Kimambo CZ, Nielsen TK (2012) Hydropower in the context of sustainable energy supply: a review of technologies and challenges. ISRN Renew Energy 2012:1–15. https://doi.org/10.5402/2012/730631

Doso O, Gao S (2020) An overview of small hydro power development in India. AIMS Energy 8:896–917. https://doi.org/10.3934/ENERGY.2020.5.896

Jawahar CP, Michael PA (2017) A review on turbines for micro hydro power plant. Renew Sustain Energy Rev 72:882–887. https://doi.org/10.1016/j.rser.2017.01.133

Rehman W, Rehman F, Malik MZ (2018) A review of Darrieus water turbines. Am Soc Mech Eng Power Div Power 2:1–9. https://doi.org/10.1115/POWER2018-7547

Rakibuzzaman M, Suh SH, Kim HH, Ryu Y, Kim KY (2021) Development of a hydropower turbine using seawater from a fish farm. Processes 9:1–24. https://doi.org/10.3390/pr9020266

Niebuhr CM, van Dijk M, Neary VS, Bhagwan JN (2019) A review of hydrokinetic turbines and enhancement techniques for canal installations: Technology, applicability and potential. Renew Sustain Energy Rev 113:1–32. https://doi.org/10.1016/j.rser.2019.06.047

Kusakana K, Vermaak HJ (2013) Hydrokinetic power generation for rural electricity supply: case of South Africa. Renew Energy 55:467–473. https://doi.org/10.1016/j.renene.2012.12.051

Kusakana K (2015) Feasibility analysis of river off-grid hydrokinetic systems with pumped hydro storage in rural applications. Energy Convers Manag 96:352–362. https://doi.org/10.1016/j.enconman.2015.02.089

Bedard R, Previsic M, Polagye B, Hagerman G (2006) North America tidal in-stream energy conversion technology feasibility study. Epri.

Kamal MM, Saini RP (2022) A review on modifications and performance assessment techniques in cross-flow hydrokinetic system. Sustain Energy Technol Assess 51:101933. https://doi.org/10.1016/j.seta.2021.101933

Khan MJ, Bhuyan G, Iqbal MT, Quaicoe JE (2009) Hydrokinetic energy conversion systems and assessment of horizontal and vertical axis turbines for river and tidal applications: a technology status review. Appl Energy 86:1823–1835. https://doi.org/10.1016/j.apenergy.2009.02.017

Article   ADS   Google Scholar  

Lalander E, Leijon M (2009) Numerical modeling of a river site for in-stream energy converters. In: 8th European wave tidal energy conference.

Sahim K, Santoso D, Radentan A (2013) Performance of combined water turbine with semielliptic section of the Savonius rotor. Int J Rotating Mach. https://doi.org/10.1155/2013/985943

Sahim K, Ihtisan K, Santoso D, Sipahutar R (2014) Experimental study of Darrieus–Savonius water turbine with deflector: effect of deflector on the performance. Int J Rotating Mach. https://doi.org/10.1155/2014/203108

Wakui T, Tanzawa Y, Hashizume T, Nagao T (2005) Hybrid configuration of Darrieus and Savonius rotors for stand-alone wind turbine-generator systems. Electr Eng Jpn (Engl Transl Denki Gakkai Ronbunshi) 150:13–22. https://doi.org/10.1002/eej.20071

Saini G, Saini RP (2020) A computational investigation to analyze the effects of different rotor parameters on hybrid hydrokinetic turbine performance. Ocean Eng 199:107019. https://doi.org/10.1016/j.oceaneng.2020.107019

Saini G, Saini RP (2018) A numerical analysis to study the effect of radius ratio and attachment angle on hybrid hydrokinetic turbine performance. Energy Sustain Dev 47:94–106. https://doi.org/10.1016/j.esd.2018.09.005

Jahangir Alam M, Iqbal MT (2009) Design and development of hybrid vertical axis turbine. Can Conf Electr Comput Eng. https://doi.org/10.1109/CCECE.2009.5090311

Bhuyan S, Biswas A (2014) Investigations on self-starting and performance characteristics of simple H and hybrid H-Savonius vertical axis wind rotors. Energy Convers Manag 87:859–867. https://doi.org/10.1016/j.enconman.2014.07.056

Kamal M, Abbas A, Alam T, Kumar N, Khargotra R (2023) Results in Engineering Hybrid cross-flow hydrokinetic turbine : Computational analysis for performance characteristics with helical Savonius blade angle of 135◦. Results Eng 20:101610. https://doi.org/10.1016/j.rineng.2023.101610

Mohamed MH (2013) Impacts of solidity and hybrid system in small wind turbines performance. Energy 57:495–504. https://doi.org/10.1016/j.energy.2013.06.004

Kamal M, Saini RP (2022) A numerical investigation on the in fl uence of Savonius blade helicity on the performance characteristics of hybrid cross- flow hydrokinetic turbine. Renew Energy 190:788–804. https://doi.org/10.1016/j.renene.2022.03.155

Kamal M, Saini RP (2023) Performance investigations of hybrid hydrokinetic turbine rotor with different system and operating parameters. Energy 267:126541. https://doi.org/10.1016/j.energy.2022.126541

Kamal M, Singal SK, Abbas A (2023) Numerical analysis on the torque characteristics of hybrid hydrokinetic turbine for different configurations and operating conditions. Ocean Eng 288:116061. https://doi.org/10.1016/j.oceaneng.2023.116061

Mohamed MH (2012) Performance investigation of H-rotor Darrieus turbine with new airfoil shapes. Energy 47:522–530. https://doi.org/10.1016/j.energy.2012.08.044

Benchikh Le Hocine AE, Poncet S, Lacey J (2020) Numerical modeling of a Darrieus horizontal axis shallow-water turbine. J Energy Eng 146:04020050. https://doi.org/10.1061/(asce)ey.1943-7897.0000700

Bin Liang Y, Zhang LX, Li EX, Liu XH, Yang Y (2014) Design considerations of rotor configuration for straight-bladed vertical axis wind turbines. Adv Mech Eng. https://doi.org/10.1155/2014/534906

Patel V, Bhat G, Eldho TI, Prabhu SV (2017) Influence of overlap ratio and aspect ratio on the performance of Savonius hydrokinetic turbine. Int J Energy Res 41:829–844. https://doi.org/10.1002/er.3670

Jeon KS, Jeong JI, Pan JK, Ryu KW (2015) Effects of end plates with various shapes and sizes on helical Savonius wind turbines. Renew Energy 79:167–176. https://doi.org/10.1016/j.renene.2014.11.035

Roy S, Saha UK (2014) An adapted blockage factor correlation approach in wind tunnel experiments of a Savonius-style wind turbine. Energy Convers Manag 86:418–427. https://doi.org/10.1016/j.enconman.2014.05.039

Botan ACB, Camacho RGR, Filho GLT, Silva ER (2021) Optimization of a draft tube using statistical techniques- DOE and 2D computational fluid dynamic analysis. J Appl Fluid Mech 14:1617–1633

Google Scholar  

Balduzzi F, Bianchini A, Maleci R, Ferrara G, Ferrari L (2016) Critical issues in the CFD simulation of Darrieus wind turbines. Renew Energy 85:419–435. https://doi.org/10.1016/j.renene.2015.06.048

McTavish S, Feszty D, Sankar T (2012) Steady and rotating computational fluid dynamics simulations of a novel vertical axis wind turbine for small-scale power generation. Renew Energy 41:171–179. https://doi.org/10.1016/j.renene.2011.10.018

Basumatary M, Biswas A, Misra RD (2018) CFD analysis of an innovative combined lift and drag (CLD) based modified Savonius water turbine. Energy Convers Manag 174:72–87. https://doi.org/10.1016/j.enconman.2018.08.025

Ghosh A, Biswas A, Sharma KK, Gupta R (2015) Computational analysis of flow physics of a combined three bladed Darrieus Savonius wind rotor. J Energy Inst 88:425–437. https://doi.org/10.1016/j.joei.2014.11.001

Lanzafame R, Mauro S, Messina M (2014) 2D CFD modeling of H-Darrieus wind turbines using a transition turbulence model. Energy Procedia 45:131–140. https://doi.org/10.1016/j.egypro.2014.01.015

Daróczy L, Janiga G, Petrasch K, Webner M, Thévenin D (2015) Comparative analysis of turbulence models for the aerodynamic simulation of H-Darrieus rotors. Energy 90:680–690. https://doi.org/10.1016/j.energy.2015.07.102

Howell R, Qin N, Edwards J, Durrani N (2010) Wind tunnel and numerical study of a small vertical axis wind turbine. Renew Energy 35:412–422. https://doi.org/10.1016/j.renene.2009.07.025

Rezaeiha A, Montazeri H, Blocken B (2019) On the accuracy of turbulence models for CFD simulations of vertical axis wind turbines. Energy 180:838–857. https://doi.org/10.1016/j.energy.2019.05.053

Beri H, Yao Y (2011) Effect of camber airfoil on self-starting of VAWT.pdf. J Environ Sci Technol 4:302–312

Yakhot V, Orszag SA (1986) Renormalization group analysis of turbulence. I. Basic theory. J Sci Comput 1:3–51. https://doi.org/10.1007/BF01061452

Article   MathSciNet   Google Scholar  

Weisstein EW (n.d.) Boundary conditions. From MathWorld—a Wolfram Web resource. https://mathworld.wolfram.com/BoundaryConditions.html

Trivedi C, Dahlhaug OG (2019) A comprehensive review of verification and validation techniques applied to hydraulic turbines. Int J Fluid Mach Syst 12:345–367. https://doi.org/10.5293/IJFMS.2019.12.4.345

Download references

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

Department of Design and Engineering, FLOVEL Energy Private Limited, Faridabad, India

Md. Mustafa Kamal & Ali Abbas

Department of Hydro and Renewable Energy, Indian Institute of Technology, Roorkee, India

S. K. Singal

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Md. Mustafa Kamal .

Ethics declarations

Conflict of interest.

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Technical Editor: Erick Franklin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Kamal, M.M., Singal, S.K. & Abbas, A. Development of correlation for the power coefficient of the hybrid hydrokinetic turbine rotor having straight-bladed Darrieus and helical-bladed Savonius rotors. J Braz. Soc. Mech. Sci. Eng. 46 , 136 (2024). https://doi.org/10.1007/s40430-024-04713-4

Download citation

Received : 18 April 2023

Accepted : 15 January 2024

Published : 17 February 2024

DOI : https://doi.org/10.1007/s40430-024-04713-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hybrid rotor
  • Power coefficient
  • Correlation
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 12 February 2024

MRI-based tumor shrinkage patterns after early neoadjuvant therapy in breast cancer: correlation with molecular subtypes and pathological response after therapy

  • Mengfan Wang 1   na1 ,
  • Siyao Du 1   na1 ,
  • Ruimeng Zhao 1 ,
  • Shasha Liu 1 ,
  • Wenhong Jiang 1 ,
  • Can Peng 1 ,
  • Ruimei Chai 1 &
  • Lina Zhang 1  

Breast Cancer Research volume  26 , Article number:  26 ( 2024 ) Cite this article

321 Accesses

Metrics details

MRI-based tumor shrinkage patterns (TSP) after neoadjuvant therapy (NAT) have been associated with pathological response. However, the understanding of TSP after early NAT remains limited. We aimed to analyze the relationship between TSP after early NAT and pathological response after therapy in different molecular subtypes.

We prospectively enrolled participants with invasive ductal breast cancers who received NAT and performed pretreatment DCE-MRI from September 2020 to August 2022. Early-stage MRIs were performed after the first (1st-MRI) and/or second (2nd-MRI) cycle of NAT. Tumor shrinkage patterns were categorized into four groups: concentric shrinkage, diffuse decrease (DD), decrease of intensity only (DIO), and stable disease (SD). Logistic regression analysis was performed to identify independent variables associated with pathologic complete response (pCR), and stratified analysis according to tumor hormone receptor (HR)/human epidermal growth factor receptor 2 (HER2) disease subtype.

344 participants (mean age: 50 years, 113/345 [33%] pCR) with 345 tumors (1 bilateral) had evaluable 1st-MRI or 2nd-MRI to comprise the primary analysis cohort, of which 244 participants with 245 tumors had evaluable 1st-MRI (82/245 [33%] pCR) and 206 participants with 207 tumors had evaluable 2nd-MRI (69/207 [33%] pCR) to comprise the 1st- and 2nd-timepoint subgroup analysis cohorts, respectively. In the primary analysis, multivariate analysis showed that early DD pattern (OR = 12.08; 95% CI 3.34–43.75; p  < 0.001) predicted pCR independently of the change in tumor size (OR = 1.37; 95% CI 0.94–2.01; p  = 0.106) in HR + /HER2 − subtype, and the change in tumor size was a strong pCR predictor in HER2 + (OR = 1.61; 95% CI 1.22–2.13; p  = 0.001) and triple-negative breast cancer (TNBC, OR = 1.61; 95% CI 1.22–2.11; p  = 0.001). Compared with the change in tumor size, the SD pattern achieved a higher negative predictive value in HER2 + and TNBC. The statistical significance of complete 1st-timepoint subgroup analysis was consistent with the primary analysis.

The diffuse decrease pattern in HR + /HER2 − subtype and stable disease in HER2 + and TNBC after early NAT could serve as additional straightforward and comprehensible indicators of treatment response.

Trial registration : Trial registration at https://www.chictr.org.cn/ . Registration number: ChiCTR2000038578, registered September 24, 2020.

Introduction

Neoadjuvant therapy (NAT) has become the important treatment for locally advanced breast cancers, and patients who achieve pathologic complete response (pCR) after NAT demonstrate improved prognosis and survival [ 1 , 2 , 3 ]. However, due to the high heterogeneity of breast cancer, the efficacy of NAT varies significantly among individuals [ 4 ]. Early monitoring NAT response of tumors is important for timely adjustment of treatment regimens to optimize efficacy, avoid unnecessary adverse effects and increase disease-free survival [ 5 , 6 ].

Dynamic contrast enhanced (DCE) MRI is a highly precise imaging technique that permits evaluation of a viable tumor before and after NAT by detecting changes in tumor vascularity [ 7 , 8 ]. Response Evaluation Criteria in Solid Tumors (RECIST) 1.1 criteria [ 9 ] defines tumor response based on the decrease in the longest tumor diameter relative to the pretreatment baseline measurement. However, tumors exhibit various patterns of shrinkage as a result of intricate processes such as necrosis, fibrosis, inflammation, and other internal changes following NAT [ 10 , 11 , 12 ]. In breast cancer, the presence of diffuse nonmass enhancement on the pretreatment MRI or scattered foci within a fibrotic region on the posttreatment MRI poses a challenge to accurately predicting pCR using size measurements [ 13 , 14 ].

Several studies have been conducted to investigate the relationship between tumor shrinkage patterns (TSP) and treatment response [ 11 , 15 ]. It has been observed that concentric and fragmented shrinkage patterns are more commonly observed in patients achieving pCR, while stable disease is noted in those who do not achieve pCR during the middle stage and after NAT [ 16 , 17 ]. Furthermore, analyses have demonstrated variations in TSP among different subtypes [ 18 , 19 , 20 ]. However, the understanding of TSP following early treatment (i.e., the first or second cycle of NAT) and their association with treatment response remains limited. Given that the alteration in tumor size following early treatment does not consistently provide reliable pCR prediction [ 21 , 22 , 23 ], we propose the hypothesis that early TSP may serve as an alternative imaging indicator for pCR prediction. This approach offers the advantage of being easily interpretable and applicable in clinical settings.

In this prospective study, we performed longitudinal breast DCE-MRI before and after early NAT to describe TSP and investigate its role as a predictor of therapeutic response. Since the NAT regimens and pCR rate differed among different molecular subtypes, we performed stratified analysis according to molecular subtype.

Materials and methods

Participants.

In this prospective, single-center, observational study, 362 participants with primary invasive ductal carcinoma who performed pretreatment DCE-MRI were enrolled. Participants eligible for our study included women with invasive breast tumors 1.0 cm or larger at imaging examination who were planning to undergo NAT. Participants with evidence of distant metastasis or progressive diseases during NAT that resulted in changing the initial NAT regimen or surgery cancellation were excluded. Our institutional review board approved this study and each participant provided written informed consent.

This study involved conducting DCE-MRI examinations at three specific timepoints during NAT, including pretreatment (referred to as Pre-MRI), after the first cycle of NAT (referred to as 1st-MRI), and/or after the second cycle of NAT (referred to as 2nd-MRI). The decision to perform Pre-MRI and 2nd-MRI was made by clinicians [ 24 ], while 1st-MRI was additionally recommended by clinicians for earlier efficacy evaluation, and its execution was contingent upon the individual preferences of participants.

Participants who performed DCE-MRI before and after early NAT (either 1st-MRI or 2nd-MRI usable) were used as the primary analysis cohort to describe TSP and investigate the value as an early pCR predictor. The primary analysis was an “intention-to-diagnose analysis” based on the total cohort of randomized participants. If a participant performed both 1st-MRI and 2nd-MRI, 2nd-MRI data of the participant were used for primary analysis. To further analyze TSP after 1st-MRI or 2nd-MRI, we conducted a subgroup analysis to determine the earliest timepoint at which TSP worked. The subgroup analysis was an “per-protocol analysis” based on complete 1st-MRI or 2nd-MRI data (referred to as 1st-timepoint and 2nd-timepoint subgroup analysis, respectively). Participants enrollment flowchart and the cohorts for the primary analysis and subgroup analysis are shown in Fig.  1 .

figure 1

Flowchart of study participants

Treatment protocol

All participants received standard six or eight cycles of NAT before surgery according to the National Comprehensive Cancer Network guideline [ 7 ]. The NAT regimens were based on anthracycline, taxane, or both anthracycline and taxane. For human epidermal growth factor receptor 2 (HER2)-positive tumors, anti-HER2 targeted trastuzumab (H) or trastuzumab + pertuzumab (HP) were added to the chemotherapy drugs.

Imaging analysis

All breast MRI examinations were performed on a 3.0T MR scanner (SIGNA™ Pioneer, GE Healthcare, Milwaukee, WI, USA) in the prone position using a dedicated 8-channel phased-array breast coil. T1-weighted (T1W) DCE-MRI sequence in the axial plane with temporal resolution of 19.4 s was obtained using three-dimensional (3D) DISCO and fat suppression technique. The scanning parameters were as follows: repetition time/echo time (TR/TE) = 4.9/1.7 ms, flip angle = 10°, field of view (FOV) = 360 × 360 mm, acquisition matrix = 256 × 256, slice thickness/gap = 1.4 mm, number of sections = 116/phase, acceleration factors = 2. After the pre-contrast scanning followed by a pause of 20 s, the contrast agent was injected intravenously as a bolus (0.1 mmol/kg body weight) by a power injector at 2 mL/s followed by a 20 mL saline flush. Subsequently, 16–20 phase post-contrast images were acquired. Additional imaging protocol details can be found in our previous publication [ 25 ].

The assessment of TSP was conducted through a comprehensive analysis of the initial, peak and late post-contrast phases (specifically, the 5th, 7th and 16th post-contrast phases) of DISCO DCE-MRI according to the time intensity curve [ 11 , 17 ]. We divided TSP into four groups based on Fukada et al.’s study [ 11 ]: concentric shrinkage (CS), diffuse decrease (DD), decrease of intensity only (DIO), and stable disease (SD). The CS pattern was further divided into three types: the simple CS, CS to small foci and CS plus decreased enhancement. The DD pattern was further divided into two types: concentric shrinkage with surrounding lesions, residual multinodular lesions (Figs.  2 , 3 ). All image analyses were independently evaluated by two breast radiologists (W.M.F. and D.S.Y.), with 5 and 10 years of experience, respectively. In cases of inconsistent decisions, resolution was reached through consultation between two radiologists. If the two radiologists were unable to reach a decision after consultation, a third radiologist (Z.L.N., with 20 years of experience) made the final decision. They were blinded to tumor clinicopathological information.

figure 2

Shrinkage patterns of mass lesions. a Concentric shrinkage (CS): CS to small foci (pretreatment: a well demarcated 47 mm mass, early neoadjuvant therapy [NAT]: tumor size was significantly reduced with only residual enhancement foci < 5 mm), b CS: simple CS (pretreatment: a well demarcated 45 mm mass, early NAT: tumor size decreased to 32 mm without any morphological changes), c CS: CS plus decreased enhancement (pretreatment: an irregular 32 mm mass, early NAT: tumor size decreased to 25 mm with significantly reduced enhancement), d diffuse decrease (DD): CS with surrounding lesions (pretreatment: a 83 mm mass, early NAT: The tumor was distinctly CS with peripherally focal lesions), e DD: shrinkage with residual multinodular lesions (pretreatment: a 60 mm mass, early NAT: tumor splits into uniform fragments mixed with fibrous stroma) f decrease of intensity only (DIO) (pretreatment: an irregular 21 mm mass, early NAT: the degree of enhancement were obviously reduced but unchanged size) and g stable disease (SD) (pretreatment: a 35 mm mass, early NAT: no change)

figure 3

Shrinkage patterns of non-mass lesions. a concentric shrinkage (CS): simple CS (pretreatment: a regional 62 mm non-mass, early neoadjuvant therapy [NAT]: tumor size decreased to 27 mm without any morphological changes), b CS: CS plus decreased enhancement (pretreatment: a multiple regions 54 mm non-mass, early NAT: tumor size decreased to 44 mm with significantly reduced enhancement) c diffuse decrease (DD): CS with surrounding lesions (pretreatment: a segmental 100 mm non-mass, early NAT: the main lesion showed CS with peripheral focal lesions), d DD: shrinkage with residual multinodular lesions (pretreatment: a diffuse 75 mm non-mass, early NAT: tumor splits into uniform small fragments mixed with fibrous stroma), e decrease of intensity only (DIO) (pretreatment: a regional 60 mm non-mass, early NAT: the degree of enhancement was obviously reduced but unchanged size), f stable disease (SD) (pretreatment: a diffuse non-mass, early NAT: no changes). No CS to small foci non-mass lesions in our study

For Pre-MRI, tumor maximum diameter was measured on the axial plane at peak phase. If multiple lesions were present, the largest tumor was selected as the targeted lesion. For follow-up images (1st-MRI or 2nd-MRI), the distance between the two farthest lesions was measured as the maximum diameter of the residual tumors for the DD pattern, while for the other patterns, the maximum diameter was measured consistently with the baseline. For the primary analysis, the tumor size before and after early NAT was recorded as D pre and D early , and tumor size on 2nd-MRI was used as D early for participants who performed both 1st-MRI and 2nd-MRI. The percentage changes (Δ%) in tumor size after early NAT was calculated using the following equation: ΔD early % = (D pre  − D early )/D pre  × 100%. For subgroup analysis, tumor size measured on 1st-MRI and 2nd-MRI was recorded as D 1st and D 2nd , respectively. The Δ% on 1st-MRI and 2nd-MRI was calculated using the following equation: ΔD 1st % = (D pre  − D 1st )/D pre  × 100%, ΔD 2nd % = (D pre  − D 2nd )/D pre  × 100%. The mean value of tumor size measured by both readers was used for the final analysis. Additionally, tumor morphological and kinetic features were analyzed according to the 5th Ed. Breast Imaging Reporting and Data System (BI-RADS) lexicon [ 26 ].

Histopathology

All patients received a core-needle biopsy guided by ultrasonography before NAT. The pathological specimens were viewed and diagnosed by a breast pathologist with more than 20 years of experience in breast pathologic examination. Immunohistochemistry (IHC) was performed for each patient to determine the baseline estrogen receptor (ER), progesterone receptor (PR), HER2 status, and Ki-67 index. According to ASCO guideline [ 27 ], the cutoff value for ER and PR was set at 1%, and the cutoff value for Ki67 was 20%. Regarding HER2 status, tumors with an IHC staining of 0 to 1+ were defined as HER2 negative and 3+ as HER2 positive. Fluorescence in situ hybridization was conducted when HER2 expression was detected as 2+ on IHC. A non-amplified FISH result denotes the HER2 status as negative, and an amplified result denotes the HER2 status as positive. Based on ER, PR, and HER2 status, the biological subtypes included the following: hormone receptor (HR) + /HER2 − (ER + and/or PR + and HER2 − ), HER2 + (HER2 + regardless of HR status) and triple-negative breast cancer (TNBC: ER − , PR − , and HER2 − ).

Definition of histologic therapeutic effects

Postoperative pathological response was graded based on the Miller-Payne grading system [ 28 ]. pCR was defined as ypT0 or ypTis with no residual invasive tumor (Miller–Payne grade 5, residual ductal carcinoma in situ could be present). Patients with Miller-Payne grades 1 or 2 were classified into the nonresponse group (pNR), and patients with grades 3, 4, or 5 were in the response group (non-pNR) (Table  1 ). The histopathologic status of the axillary lymph nodes was not considered in pCR definition.

Statistical analysis

Mann–Whitney and Chi-square (or Fisher’s exact) tests were used to compare the differences in clinicopathological and imaging features between the pCR and non-pCR groups (or pNR and non-pNR groups in HR + /HER2 − subtype). To compare TSP in different treatment response groups, the Chi-square test and Bonferroni correction for multiple comparisons were used, with a p value < 0.00833 ( p  < 0.05/6) considered statistically significant. The inter-reader agreement between both readers for TSP was calculated using Cohen’s Kappa (κ).

Clinicopathologic and imaging features potentially predictive for pCR were analyzed using binary logistic regression. Factors with a p value of < 0.10 on univariate logistic regression were entered into multivariate logistic regression and a p value < 0.05 was statistically significant. Performance for predicting pCR was assessed with the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive and negative predictive values (PPV and NPV). All analyses were performed using Statistical Package for the Social Sciences (SPSS, version 25.0, IBM Corporation, Armonk, NY, USA) and MedCalc (version 15.6.1).

Participants characteristics

A total of 362 consecutive participants from September 2020 to August 2022 were enrolled in our study. Eighteen (5.0%) of 362 participants were excluded due to evidence of distant metastasis or progressive diseases during NAT. The remaining 344 participants with 345 tumors (1 bilateral, mean age: 50 years) who underwent DCE-MRI examinations after early NAT comprised the primary analysis cohort, which included 138 1st-MRI and 207 2nd-MRI examinations. For subgroup analysis, 244 of 344 participants (245 tumors) who had evaluable 1st-MRI and 206 of 344 participants (207 tumors) who had evaluable 2nd-MRI comprised the 1st- and 2nd-timepoint subgroup analysis cohorts, respectively (Fig.  1 ).

Baseline characteristics for the primary analysis cohort and two subgroup analysis cohorts are listed in Table  2 . The most common molecular subtype was HR + /HER2 − (151/345, 44%) followed by HER2 + (123/345, 36%) and TNBC (71/ 345, 21%). After NAT, 113/345 (33%) achieved pCR. No significant difference was found in the primary analysis cohort versus the two subgroup analysis cohorts across all characteristics (Table  2 , all p  > 0.05).

In the primary analysis cohort, pCR tended to present with high histologic grade, low D early and large change in tumor size ( p  < 0.001). Molecular subtype and NAT regimen showed a significant association with pCR ( p  < 0.001). No significant difference was detected between participants with pCR and non-pCR in terms of age, baseline tumor size, menopausal status, clinical TNM stage and other MRI characteristics (Additional file 1 : Table S1, all p  > 0.05). The participants characteristics in subgroup analysis cohorts were consistent with those of the primary analysis cohort (Additional file 1 : Table S2).

Inter-reader agreement

The inter-reader agreement was considered almost perfect in the primary analysis (κ = 0.929), 1st-timepoint (κ = 0.942), and 2nd-timepoint (κ = 0.941) subgroup analysis cohorts. The detailed results for two readers are shown in Additional file 1 : Tables S3 and S4.

The primary analysis after early NAT

Table  3 shows the TSP for each molecular subtype in the primary analysis cohort. After early NAT, the CS pattern had the highest frequency in each molecular subtype (78/151 [52%] for HR + /HER2 − , 67/123 [54%] for HER2 + and 45/71 [63%] for TNBC), with mainly simple CS pattern. The DD pattern had 29/151 (19%) in HR + /HER2 − , 41/123 (33%) in HER2 + and 13/71 (18%) in TNBC. The SD pattern had 41/151 (27%) in HR + /HER2 − , 12/123 (10%) in HER2 + and 13/71 (18%) in TNBC. The DIO pattern rarely appeared after early NAT, with only 3/151 (2.0%) in HR + /HER2 − , 3/123 (2.4%) in HER2 + and 0/71 in TNBC.

After early NAT, the DD pattern had the highest pCR rate (11/29 [38%]) in HR + /HER2 − subtype compared with the CS pattern (4/78 [5.1%], p  < 0.001) and no pCR case was found in the DIO (0/3) and SD (0/41) patterns. Considering the low pCR rate in HR + /HER2 − subtype, we subsequently investigated the correlation between TSP and pNR (Additional file 1 : Table S5). The HR + /HER2 − subtype presenting with the DD pattern (24/29 [83%]) after early NAT had the highest non-pNR rate compared with the CS pattern (48/78 [62%], p  = 0.006) and SD pattern (17/41 [41%], p  < 0.001), and no pNR case was found in the DIO (0/3) pattern. The CS (40/67 [60%]), DD (25/41 [61%]) and DIO (2/3 [67%]) patterns had the considerable pCR rate in HER2 + subtype. The CS pattern (25/45 [56%]) had the highest pCR rate, followed by the DD pattern (5/13 [38%]) in TNBC. Especially for CS to small foci pattern, 100% pCR rate was obtained after NAT in each subtype despite the low incidence rate (HR + /HER2 − : 1/1, HER2 + : 9/9, and TNBC: 3/3). The SD pattern had the highest non-pCR rate in each molecular subtype as 41/41 (100%) for HR + /HER2 − (SD vs. DD, p  < 0.001), as 11/12 (92%) for HER2 + (SD vs. CS, p  < 0.001; SD vs. DD, p  < 0.001), as 13/13 (100%) for TNBC (SD vs. CS, p  < 0.001; SD vs. DD, p  = 0.007) (Table  3 ).

Multivariate analysis showed that early DD pattern (OR = 12.08; 95% CI 3.34–43.75; p  < 0.001) predicted pCR independently of the change in tumor size (OR = 1.37; 95% CI 0.94–2.01; p  = 0.106) (Table  4 ), and early DD pattern (OR = 0.29; 95% CI 0.10–0.88; p  = 0.029) emerged as an independent predictor of pNR in addition to the change in tumor size (OR = 0.65; 95% CI 0.45–0.95; p  = 0.027) in HR + /HER2 − subtype (Additional file 1 : Table S6). In HER2 + subtype, univariate analysis showed that early SD pattern and the change in tumor size were associated with pCR; multivariate analysis showed that the change in tumor size (OR = 1.61; 95% CI 1.22–2.13; p  = 0.001) was the only independent factor to predict pCR. In TNBC, univariate analysis showed that early change in tumor size (OR = 1.61; 95% CI 1.22–2.11, p  = 0.001) was the only factor to predict pCR (Fig.  4 ). Compared with the change in tumor size, the SD pattern achieved a higher NPV in HER2 + and TNBC (Additional file 1 : Table S7).

figure 4

A Invasive ductal carcinoma (HR + /HER2 − ) with pathologic complete response (pCR) after NAT in a 49-year-old woman: (a) pretreatment: a 71 mm mass occupying most glands in the upper right quadrant; (b) early neoadjuvant therapy (NAT): the lesions showed shrinkage with residual multinodular lesions (DD pattern). Invasive ductal carcinoma (HR + /HER2 − ) with non-pCR after NAT in a 70-year-old woman: (c) pretreatment: a 25 mm mass in the upper right quadrant; (d) early NAT: the lesions showed the simple concentric shrinkage (CS pattern) with a diameter reduction of 4 mm. B Invasive ductal carcinoma (HER2 + ) with pCR after NAT in a 44-year-old woman: (a) pretreatment: a 45 mm mass in the upper left quadrant; (b) early NAT: the lesion size was notably diminished with only residual enhancement foci (CS: CS to small foci pattern). Invasive ductal carcinoma (HER2 + ) with non-pCR after NAT in a 58-year-old woman: (c) pretreatment: a 29 mm mass in the upper left quadrant; (d) early NAT: the lesions showed the stable disease (SD pattern) with no changes in size or morphology. C Invasive ductal carcinoma (TNBC) with pCR after NAT in a 53-year-old woman: (a) pretreatment: a 43 mm mass in the upper left quadrant; (b) early NAT: the lesions showed the simple concentric shrinkage (CS pattern) with a diameter reduction of 13 mm. Invasive ductal carcinoma (TNBC) with non-pCR after NAT in a 65-year-old woman: (c) pretreatment: a 50 mm mass in the upper right quadrant; (d) early NAT: the lesions showed the stable disease (SD pattern) with no changes in size or morphology

The 1st-timepoint Subgroup Analysis

Additional file 1 : Table S8 shows the distribution of TSP for each subtype in the subgroup analysis cohorts. At 1st-timepoint, the CS pattern had the highest frequency in each molecular subtype (45/100 [45%] for HR + /HER2 − , 56/94 [60%] for HER2 + and 32/51 [63%] for TNBC), with mainly simple CS pattern. The DD pattern had 13/100 (13%) in HR + /HER2 − , 23/94 (24%) in HER2 + and 5/51 (10%) in TNBC. The SD pattern had 40/100 (40%) in HR + /HER2 − , 12/94 (13%) in HER2 + and 14/51 (27%) in TNBC. The DIO pattern rarely appeared at 1st-timepoint, with only 2/100 (2.0%) in HR + /HER2 − , 3/94 (3.2%) in HER2 + and 0/51 in TNBC.

At 1st-timepoint, the DD pattern (5/13 [38%]) had the highest pCR rate in HR + /HER2 − subtype compared with the CS pattern (3/45 [6.7%], p  = 0.002) and no pCR case was found in the DIO (0/2) and SD (0/40) patterns. All patients presenting with the DD pattern showed non-pNR (13/13 [100%]) in HR + /HER2 − subtype (Additional file 1 : Table S5). The CS (36/56 [64%]) and DIO (2/3 [67%]) pattern had the considerable pCR rate in HER2 + subtype. The CS pattern (19/32 [59%]) and DD pattern (3/5 [60%]) had the considerable pCR rate in TNBC. The SD pattern had the highest non-pCR rate in each molecular subtype as 40/40 (100%) for HR + /HER2 − (SD vs. DD, p  < 0.001), as 11/12 (92%) for HER2 + (SD vs. CS, p  < 0.001; SD vs. DD, p  = 0.002), as 13/14 (93%) for TNBC (SD vs. CS, p  < 0.001; SD vs. DD, p  = 0.006) (Additional file 1 : Table S8).

In 1st-timepoint subgroup analysis, multivariate analysis showed that the DD pattern (OR = 9.99; 95% CI 1.78–56.04; p  = 0.009) predicted pCR independently of the change in tumor size (OR = 0.87; 95% CI 0.46–1.64; p  = 0.659) in HR + /HER2 − subtype. In HER2 + subtype, univariate analysis showed that the SD pattern and change in tumor size were associated with pCR; multivariate analysis showed that the change in tumor size was the only independent factor to predict pCR (OR = 1.75; 95% CI 1.20–2.56; p  = 0.004). In TNBC, univariate analysis showed that the SD pattern (OR = 0.05; 95% CI 0.01–0.45, p  = 0.007) and change in tumor size (OR = 1.94; 95% CI 1.29–2.92; p  = 0.001) were associated with pCR, but the differences were not statistically significant in multivariate analysis (Additional file 1 : Table S9). The result of complete 1st-timepoint analysis was consistent with the primary analysis.

The 2nd-timepoint subgroup analysis

At 2nd-timepoint, the CS pattern had the highest frequency in each molecular subtype (50/94 [53%] for HR + /HER2 − , 37/74 [50%] for HER2 + and 24/39 [62%] for TNBC), with mainly simple CS pattern. The DD pattern had 22/94 (23%) in HR + /HER2 − , 30/74 (41%) in HER2 + and 11/39 (28%) in TNBC. The SD pattern had 20/94 (21%) in HR + /HER2 − , 5/74 (6.8%) in HER2 + and 4/39 (10%) in TNBC. The DIO pattern rarely appeared after early NAT, with only 2/94 (2.1%) in HR + /HER2 − , 2/74 (2.7%) in HER2 + and 0/39 in TNBC (Additional file 1 : Table S8).

The DD pattern (6/22 [27%]) had the highest pCR rate in HR + /HER2 − subtype compared with the CS pattern (3/50 [6.0%], p  = 0.003) and no pCR case was found in the DIO (0/2) and SD (0/20) patterns. Additionally, the DD pattern (17/22 [77%]) had the highest non-pNR rate compared with the CS pattern (32/50 [64%], p  = 0.044) and SD pattern (6/20 [30%], p  < 0.001) in HR + /HER2 − subtype (Additional file 1 : Table S5). The CS (23/37 [62%]) and DD (20/30 [67%]) pattern had the considerable pCR rate in HER2 + subtype, while the CS pattern had the highest pCR rate in TNBC (12/24 [50%]). All patients presenting with the SD pattern showed non-pCR in each subtype (Additional file 1 : Table S8).

In 2nd-timepoint subgroup analysis, multivariate analysis showed that the DD pattern (OR = 7.72; 95% CI 1.55–38.53; p  = 0.013) emerged as an independent predictor of pCR in addition to the change in tumor size (OR = 1.61; 95% CI 1.01–2.59, p  = 0.046) in HR + /HER2 − subtype. Univariate analysis showed that the change in tumor size was the only factor to predict pCR in HER2 + subtype (OR = 1.45; 95% CI 1.10–1.91; p  = 0.008) and TNBC (OR = 1.43; 95% CI 1.03–1.98; p  = 0.033) (Additional file 1 : Table S9).

Compared with the primary cohort and complete 1st-timepoint subgroup analysis, the DD pattern was no longer the only independent pCR predictor in HR + /HER2 − subtype at 2nd-timepoint analysis (Additional file 1 : Table S9). In HER2 + and TNBC, the change in tumor size at 1st-timepoint (HER2 + : OR = 1.86, AUC = 0.731, both p  < 0.001; TNBC: OR = 1.94, p  = 0.001; AUC = 0.804, p  < 0.001) had a greater impact on pCR prediction than that at 2nd-timepoint (HER2 + : OR = 1.45, p  = 0.008; AUC = 0.677, p  = 0.007; TNBC: OR = 1.43, p  = 0.033; AUC = 0.693, p  = 0.034) (Additional file 1 : Table S9, Fig. S1).

Early imaging response strategy map

Strategy maps based on TSP and the change in tumor size in each subtype are plotted. For pCR prediction in HR + /HER2 − subtype, radiologists should first identify the non-pCR patients with the SD pattern (or a few DIO or CS plus decreased enhancement patterns) and the simple CS pattern; Then evaluate whether the patient has the DD pattern, which is a potential pCR manifestation although there is only 38% likelihood of pCR. In HER2 + and TNBC, we should first identify a pCR patient with the CS to small foci pattern or a non-pCR patient with the SD pattern; If neither, the likelihood of pCR depends on the tumor size change with OR of 1.86 in HER2 + and 1.94 in TNBC for 10% increment at 1st-timepoint, for example (Fig.  5 ).

figure 5

Strategy map for predicting pCR based on shrinkage patterns and the change in tumor size in each subtype

For pNR prediction in HR + /HER2 − subtype, radiologists should first identify whether the patient has the DD pattern, which is a highly likely non-pNR manifestation. If the patient does not have the DD pattern, the likelihood of pNR depends on the tumor size change with OR of 0.63 for 10% increment at 1st-timepoint, for example (Fig.  6 ).

figure 6

Strategy map for predicting pNR based on shrinkage patterns and the change in tumor size in HR + /HER2 − subtype

In the modern era with updated neoadjuvant therapy regimens, the present study evaluated TSP on DCE-MRI after early NAT and its association with pCR within each breast cancer subtype. Our findings indicated that the DD pattern after early NAT, particularly at 1st-timepoint, was a tumor response marker independent of the size change in HR + /HER2 − subtype; the SD pattern in HER2 + and TNBC after early NAT strongly indicated non-pCR. TSP could serve as additional straightforward and comprehensible indicators of treatment response in addition to the change in tumor size.

The classification and definition of TSP at MRI have not been consistently recognized and unified. The CS or non-CS patterns after NAT and further refinement of non-CS pattern at mid-NAT were commonly used [ 11 , 17 , 20 ]. Based on Fukada et al.’s study [ 11 ], we developed four-category TSP and subdivided CS and DD pattern to suit early NAT response. The overall loss of cellularity after NAT was not always reflected by a decreased tumor size. NAT can cause different changes in the nucleus and cytoplasm of tumors, leading to changes in overall morphology and exhibiting different TSP [ 29 ]. Compared to HER2 + subtype, HR + /HER2 − subtype tends to grow slowly, showing low apoptosis rates and genetic instability [ 11 ]. The internal heterogeneity of these tumors causes them to shrink inconsistently and crumble into small foci or scattered cells. The sparse microvascular distribution in HR + /HER2 − subtype also leads to uneven drug delivery, which tends to have the DD pattern after NAT. In our study, the DD pattern after early NAT tended toward pCR in HR + /HER2 − subtype, mainly at 1st-timepoint, independent of size change. Reis et al. [ 20 ] reported that early fragmentation pattern after 2 months neoadjuvant endocrine therapy suggested effective treatment in ER + /HER2 − subtype. The DD pattern may be the early manifestation of HR + /HER2 − subtype response to NAT earlier than size reduction. Our study recommended introducing the TSP for early imaging response strategies in HR + /HER2 − subtype.

HER2 + and TNBC have the highest proportion of CS pattern, which is consistent with previous studies for mid- and post-NAT evaluation [ 30 , 31 , 32 , 33 ]. Animal studies [ 34 ] on tumor subregions have shown that tumor margins of HER2 + and TNBC are distributed with abundant microvessels and high cell proliferation. Abundant vessels facilitated the delivery of drugs thus making these tumors more sensitive to therapy, resulting in more homogeneous cell reduction and shrinkage. Heacock et al. [ 18 ] and Eom et al. [ 19 ] reported that the CS pattern was a stronger predictor of pCR in HER2 + and TNBC after NAT. Different from post-NAT timepoint, the CS pattern after early NAT did not show a significant pCR tendency compared with the DD pattern, but the SD pattern strongly indicated non-pCR in HER2 + and TNBC. The change in tumor size was still a strong predictor of pCR in HER2 + and TNBC after early NAT, even at 1st-timepoint.

Based on the observed TSP, we develop an early imaging response strategy for each subtype of breast cancer. By employing this strategy, clinicians can effectively inform patients of the potential pathological response and its associated probability. The results of 1st-timepoint subgroup analysis were consistent with those of the primary analysis cohort, indicating TSP can be evaluated even after the first cycle of NAT. This easily understandable approach can assist clinicians in modifying treatment plans to enhance effectiveness, minimize unnecessary adverse effects, and improve disease-free survival rates. However, noted that the signal intensities of DCE-MRI are influenced by imaging protocols and gadolinium-based contrast agents from different vendors, therefore TSP such as “DIO” and “CS plus decreased enhancement” may be susceptible to potential influences. To mitigate the variability in TSP evaluation after treatment, it is crucial to utilize uniformity MRI scanners, standardized contrast agents, and skilled radiologists in the serial imaging evaluation of the identical patient during NAT.

Our study had some limitations. First, despite the overall large sample size, the number of each subtype was limited. Enhancing the sample size for each subtype would augment the strength of our evidence. Secondly, the homogeneity of the study sample and the data acquisition method mitigated the influence of confounding variables, but the result may be specific to this acquisition technique. The performance of our findings on a different scanner platform, or with different imaging protocol is unknown. Finally, our study employed visual assessment conducted by radiologists, which was both qualitative and subjective. Future research should strive to incorporate artificial intelligence techniques to enable rapid, objective and reproducible analysis of TSP.

The TSP after early NAT may serve as an additional straightforward and comprehensible indicator of treatment response in addition to the change in tumor size. Specifically, the diffuse decrease pattern in HR + /HER2 − subtype is a tumor response marker independent of the size change, and the stable disease in HER2 + and TNBC strongly indicates non-pCR at 1st-timepoint.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

  • Tumor shrinkage patterns

Concentric shrinkage

Diffuse decrease

Decrease of intensity only

Stable disease

Human epidermal growth factor receptor 2

Hormone receptor

Triple-negative breast cancer

  • Neoadjuvant therapy
  • Pathologic complete response

Burstein HJ, Curigliano G, Thürlimann B, Weber WP, Poortmans P, Regan MM, Senn HJ, Winer EP, Gnant M. Customizing local and systemic therapies for women with early breast cancer: the St. Gallen International Consensus Guidelines for treatment of early breast cancer 2021. Ann Oncol. 2021;32(10):1216–35.

Article   CAS   PubMed   Google Scholar  

Symmans WF, Yau C, Chen YY, Balassanian R, Klein ME, Pusztai L, Nanda R, Parker BA, Datnow B, Krings G, et al. Assessment of residual cancer burden and event-free survival in neoadjuvant treatment for high-risk breast cancer: an analysis of data from the I-SPY2 randomized clinical trial. JAMA Oncol. 2021;7(11):1654–63.

Article   PubMed   Google Scholar  

Yee D, DeMichele AM, Yau C, Isaacs C, Symmans WF, Albain KS, Chen YY, Krings G, Wei S, Harada S, et al. Association of event-free and distant recurrence-free survival with individual-level pathologic complete response in neoadjuvant treatment of stages 2 and 3 breast cancer: three-year follow-up analysis for the I-SPY2 adaptively randomized clinical trial. JAMA Oncol. 2020;6(9):1355–62.

von Minckwitz G, Untch M, Blohmer JU, Costa SD, Eidtmann H, Fasching PA, Gerber B, Eiermann W, Hilfrich J, Huober J, et al. Definition and impact of pathologic complete response on prognosis after neoadjuvant chemotherapy in various intrinsic breast cancer subtypes. J Clin Oncol. 2012;30(15):1796–804.

Article   Google Scholar  

Drukker K, Li H, Antropova N, Edwards A, Papaioannou J, Giger ML. Most-enhancing tumor volume by MRI radiomics predicts recurrence-free survival “early on” in neoadjuvant treatment of breast cancer. Cancer Imaging. 2018;18(1):12.

Article   PubMed   PubMed Central   Google Scholar  

Hylton NM, Gatsonis CA, Rosen MA, Lehman CD, Newitt DC, Partridge SC, Bernreuter WK, Pisano ED, Morris EA, Weatherall PT, et al. Neoadjuvant chemotherapy for breast cancer: functional tumor volume by MR imaging predicts recurrence-free survival-results from the ACRIN 6657/CALGB 150007 I-SPY 1 TRIAL. Radiology. 2016;279(1):44–55.

Gradishar WJ, Moran MS, Abraham J, Aft R, Agnese D, Allison KH, Anderson B, Burstein HJ, Chew H, Dang C, et al. Breast cancer, version 3.2022, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw. 2022;20(6):691–722.

Scheel JR, Kim E, Partridge SC, Lehman CD, Rosen MA, Bernreuter WK, Pisano ED, Marques HS, Morris EA, Weatherall PT, et al. MRI, clinical examination, and mammography for preoperative assessment of residual disease and pathologic complete response after neoadjuvant chemotherapy for breast cancer: ACRIN 6657 trial. AJR Am J Roentgenol. 2018;210(6):1376–85.

Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009;45(2):228–47.

Liegmann AS, Heselmeyer-Haddad K, Lischka A, Hirsch D, Chen WD, Torres I, Gemoll T, Rody A, Thorns C, Gertz EM, et al. Single cell genetic profiling of tumors of breast cancer patients aged 50 years and older reveals enormous intratumor heterogeneity independent of individual prognosis. Cancers (Basel). 2021;13(13):3366.

Fukada I, Araki K, Kobayashi K, Shibayama T, Takahashi S, Gomi N, Kokubu Y, Oikado K, Horii R, Akiyama F, et al. Pattern of tumor shrinkage during neoadjuvant chemotherapy is associated with prognosis in low-grade luminal early breast cancer. Radiology. 2018;286(1):49–57.

Sethi D, Sen R, Parshad S, Khetarpal S, Garg M, Sen J. Histopathologic changes following neoadjuvant chemotherapy in various malignancies. Int J Appl Basic Med Res. 2012;2(2):111–6.

Bahri S, Chen JH, Mehta RS, Carpenter PM, Nie K, Kwon SY, Yu HJ, Nalcioglu O, Su MY. Residual breast cancer diagnosed by MRI in patients receiving neoadjuvant chemotherapy with and without bevacizumab. Ann Surg Oncol. 2009;16(6):1619–28.

Kim HJ, Im YH, Han BK, Choi N, Lee J, Kim JH, Choi YL, Ahn JS, Nam SJ, Park YS, et al. Accuracy of MRI for estimating residual tumor size after neoadjuvant chemotherapy in locally advanced breast cancer: relation to response patterns on MRI. Acta Oncol. 2007;46(7):996–1003.

Kim SY, Cho N, Choi Y, Lee SH, Ha SM, Kim ES, Chang JM, Moon WK. Factors affecting pathologic complete response following neoadjuvant chemotherapy in breast cancer: development and validation of a predictive nomogram. Radiology. 2021;299(2):290–300.

Kim TH, Kang DK, Yim H, Jung YS, Kim KS, Kang SY. Magnetic resonance imaging patterns of tumor regression after neoadjuvant chemotherapy in breast cancer patients: correlation with pathological response grading system based on tumor cellularity. J Comput Assist Tomogr. 2012;36(2):200–6.

Goorts B, Dreuning KMA, Houwers JB, Kooreman LFS, Boerma EG, Mann RM, Lobbes MBI, Smidt ML. MRI-based response patterns during neoadjuvant chemotherapy can predict pathological (complete) response in patients with breast cancer. Breast Cancer Res. 2018;20(1):34.

Heacock L, Lewin A, Ayoola A, Moccaldi M, Babb JS, Kim SG, Moy L. Dynamic contrast-enhanced MRI evaluation of pathologic complete response in human epidermal growth factor receptor 2 (HER2)-positive breast cancer after HER2-targeted therapy. Acad Radiol. 2020;27(5):e87–93.

Eom HJ, Cha JH, Choi WJ, Chae EY, Shin HJ, Kim HH. Predictive clinicopathologic and dynamic contrast-enhanced MRI findings for tumor response to neoadjuvant chemotherapy in triple-negative breast cancer. AJR Am J Roentgenol. 2017;208(6):W225-w230.

Reis J, Thomas O, Lahooti M, Lyngra M, Schandiz H, Boavida J, Gjesdal KI, Sauer T, Geisler J, Geitung JT. Correlation between MRI morphological response patterns and histopathological tumor regression after neoadjuvant endocrine therapy in locally advanced breast cancer: a randomized phase II trial. Breast Cancer Res Treat. 2021;189(3):711–23.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hylton NM, Blume JD, Bernreuter WK, Pisano ED, Rosen MA, Morris EA, Weatherall PT, Lehman CD, Newstead GM, Polin S, et al. Locally advanced breast cancer: MR imaging for prediction of response to neoadjuvant chemotherapy–results from ACRIN 6657/I-SPY TRIAL. Radiology. 2012;263(3):663–72.

Tudorica A, Oh KY, Chui SY, Roy N, Troxell ML, Naik A, Kemmer KA, Chen Y, Holtorf ML, Afzal A, et al. Early prediction and evaluation of breast cancer response to neoadjuvant chemotherapy using quantitative DCE-MRI. Transl Oncol. 2016;9(1):8–17.

Dogan BE, Yuan Q, Bassett R, Guvenc I, Jackson EF, Cristofanilli M, Whitman GJ. Comparing the performances of magnetic resonance imaging size vs pharmacokinetic parameters to predict response to neoadjuvant chemotherapy and survival in patients with breast cancer. Curr Probl Diagn Radiol. 2019;48(3):235–40.

Breast cancer professional committee of Chinese Anti-cancer Association. Guidelines and standards for the diagnosis and treatment of breast cancer by the Chinese Anti-Cancer Association (2019 Edition). Chin J Cancer. 2019;29:609–680.

Du S, Gao S, Zhao R, Liu H, Wang Y, Qi X, Li S, Cao J, Zhang L. Contrast-free MRI quantitative parameters for early prediction of pathological response to neoadjuvant chemotherapy in breast cancer. Eur Radiol. 2022;32(8):5759–72.

D'Orsi C, Morris E, Mendelson E. ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System. 2013.

Hammond ME, Hayes DF, Dowsett M, Allred DC, Hagerty KL, Badve S, Fitzgibbons PL, Francis G, Goldstein NS, Hayes M, et al. American Society of Clinical Oncology/College Of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J Clin Oncol. 2010;28(16):2784–95.

Ogston KN, Miller ID, Payne S, Hutcheon AW, Sarkar TK, Smith I, Schofield A, Heys SD. A new histological grading system to assess response of breast cancers to primary chemotherapy: prognostic significance and survival. Breast. 2003;12(5):320–7.

Wasser K, Sinn HP, Fink C, Klein SK, Junkermann H, Lüdemann HP, Zuna I, Delorme S. Accuracy of tumor size measurement in breast cancer using MRI is influenced by histological regression induced by neoadjuvant chemotherapy. Eur Radiol. 2003;13(6):1213–23.

Ballesio L, Gigli S, Di Pastena F, Giraldi G, Manganaro L, Anastasi E, Catalano C. Magnetic resonance imaging tumor regression shrinkage patterns after neoadjuvant chemotherapy in patients with locally advanced breast cancer: correlation with tumor biological subtypes and pathological response after therapy. Tumor Biol. 2017;39(3):101.

Loo CE, Straver ME, Rodenhuis S, Muller SH, Wesseling J, Vrancken Peeters MJ, Gilhuijs KG. Magnetic resonance imaging response monitoring of breast cancer during neoadjuvant chemotherapy: relevance of breast cancer subtype. J Clin Oncol. 2011;29(6):660–6.

Yoshikawa K, Ishida M, Kan N, Yanai H, Tsuta K, Sekimoto M, Sugie T. Direct comparison of magnetic resonance imaging and pathological shrinkage patterns of triple-negative breast cancer after neoadjuvant chemotherapy. World J Surg Oncol. 2020;18(1):177.

Mukhtar RA, Yau C, Rosen M, Tandon VJ, Hylton N, Esserman LJ. Clinically meaningful tumor reduction rates vary by prechemotherapy MRI phenotype and tumor subtype in the I-SPY 1 TRIAL (CALGB 150007/150012; ACRIN 6657). Ann Surg Oncol. 2013;20(12):3823–30.

Syed AK, Whisenant JG, Barnes SL, Sorace AG, Yankeelov TE. Multiparametric analysis of longitudinal quantitative MRI data to identify distinct tumor habitats in preclinical models of breast cancer. Cancers (Basel). 2020;12(6):1682.

Download references

Acknowledgements

Not applicable.

This study has received funding from National Natural Science Foundation of China (81971695, 82302165, 82371947), Liaoning Province Applied Basic Research Program (Xingliao Talent Program) (2022JH2/101300027), Liaoning Provincial Science and Technology Plan (2022-BS-119).

Author information

Mengfan Wang and Siyao Du have contributed equally to this work.

Authors and Affiliations

Department of Radiology, The First Hospital of China Medical University, Nanjing North Street 155, Shenyang, 110001, Liaoning Province, China

Mengfan Wang, Siyao Du, Si Gao, Ruimeng Zhao, Shasha Liu, Wenhong Jiang, Can Peng, Ruimei Chai & Lina Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

All Authors contributed to the study’s conception and design. MW: conceptualization; data curation; investigation; methodology; writing-original draft. SD: conceptualization; data curation; investigation; methodology; writing-original draft. SG, RZ, SL, WJ, CP: investigation; methodology; writing-review and editing. RC: writing—review and editing. LZ: conceptualization, methodology, project administration, supervision, writing-review and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lina Zhang .

Ethics declarations

Ethics approval and consent to participate.

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of the First Hospital of China Medical University (Ethic code: 2019-33-2 with date of approval 6 March 2019). Participants were enrolled after providing their written informed consent.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

. Table S1 . Participants characteristics in the primary analysis cohort. Table S2 . Participants characteristics in the subgroup analysis cohorts. Table S3. Inter-reader agreement for tumor shrinkage patterns in each cohort. Table S4. Inconsistent shrinkage pattern distribution between two readers. Table S5. MRI-based tumor shrinkage patterns association with pNR in HR + /HER2 − subtype. Table S6. Univariate and multivariate analysis of factors associated with pNR in HR + /HER2 − subtype. Table S7 . The diagnostic efficacy of factors in each molecular subtype. Table S8. MRI-based tumor shrinkage patterns association with pCR according to different molecular subtypes in the subgroup analysis cohorts. Table S9 . Univariate and multivariate analysis of factors associated with pCR according to different molecular subtypes in the subgroup analysis cohorts. Figure S1. Receiver operating characteristic (ROC) curves of the change in tumor size (continuous variable) at 1st-timepoint and 2nd-timepoint for pathologic complete response (pCR) prediction in the breast.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Wang, M., Du, S., Gao, S. et al. MRI-based tumor shrinkage patterns after early neoadjuvant therapy in breast cancer: correlation with molecular subtypes and pathological response after therapy. Breast Cancer Res 26 , 26 (2024). https://doi.org/10.1186/s13058-024-01781-1

Download citation

Received : 11 November 2023

Accepted : 09 February 2024

Published : 12 February 2024

DOI : https://doi.org/10.1186/s13058-024-01781-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Breast cancer
  • Magnetic resonance imaging

Breast Cancer Research

ISSN: 1465-542X

meaning of analysis correlation

IMAGES

  1. Correlation Analysis definition, formula and step by step procedure

    meaning of analysis correlation

  2. Correlation Coefficient

    meaning of analysis correlation

  3. Correlation: What It Means in Finance and the Formula for Calculating It

    meaning of analysis correlation

  4. Correlation

    meaning of analysis correlation

  5. Correlation Analysis (LEC-4)

    meaning of analysis correlation

  6. Correlation: Meaning, Types, Examples & Coefficient

    meaning of analysis correlation

VIDEO

  1. 25- Correlation analysis

  2. Unit-2 Understanding Statistics

  3. 55. Correlation& Regression

  4. Correlation Concepts

  5. Correlation Estimator

  6. Correlation topic

COMMENTS

  1. Correlation Analysis

    Correlation analysis is a statistical method used to evaluate the strength and direction of the relationship between two or more variables. The correlation coefficient ranges from -1 to 1. A correlation coefficient of 1 indicates a perfect positive correlation. This means that as one variable increases, the other variable also increases.

  2. Correlation Coefficient

    A correlation coefficient is a bivariate statistic when it summarizes the relationship between two variables, and it's a multivariate statistic when you have more than two variables. If your correlation coefficient is based on sample data, you'll need an inferential statistic if you want to generalize your results to the population.

  3. Correlation in Statistics: Correlation Analysis Explained

    The study of how variables are correlated is called correlation analysis. Some examples of data that have a high correlation: Your caloric intake and your weight. Your eye color and your relatives' eye colors. The amount of time your study and your GPA. Some examples of data that have a low correlation (or none at all):

  4. What is Correlation Analysis? A Definition and Explanation

    Essentially, correlation analysis is used for spotting patterns within datasets. A positive correlation result means that both variables increase in relation to each other, while a negative correlation means that as one variable decreases, the other increases. Correlation Coefficients

  5. Correlation: Meaning, Types, Examples & Coefficient

    Correlation means association - more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation. Types

  6. Interpreting Correlation Coefficients

    In statistics, correlation coefficients are a quantitative assessment that measures both the direction and the strength of this tendency to vary together. There are different types of correlation coefficients that you can use for different kinds of data. In this post, I cover the most common type of correlation—Pearson's correlation coefficient.

  7. Correlation

    In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related.

  8. Correlation

    Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effect. How is correlation measured?

  9. What Is Correlation Analysis: Comprehensive Guide

    Correlation analysis, also known as bivariate, is a statistical test primarily used to identify and explore linear relationships between two variables and then determine the strength and direction of that relationship. It's mainly used to spot patterns within datasets. It's worth noting that correlation doesn't equate to causation.

  10. Correlation coefficient review (article)

    The correlation coefficient r measures the direction and strength of a linear relationship. Calculating r is pretty complex, so we usually rely on technology for the computations. We focus on understanding what r says about a scatterplot. Here are some facts about r : It always has a value between − 1. ‍.

  11. Correlation

    Correlation - Connecting the Dots, the Role of Correlation in Data Analysis. September 23, 2023; Jagdeesh; Correlation is a fundamental concept in statistics and data science. It quantifies the degree to which two variables are related. ... - Meaning: When one variable increases, the other also increases, and when one decreases, the other ...

  12. Pearson Correlation Coefficient (r)

    The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. When one variable changes, the other variable changes in the same direction.

  13. Correlational Research

    A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative. Table of contents

  14. Correlation: Meaning, Strength, and Examples

    A correlation is a statistical measurement of the relationship between two variables. Remember this handy rule: The closer the correlation is to 0, the weaker it is. The closer it is to +/-1, the stronger it is. Types of Correlation Correlation strength ranges from -1 to +1. Positive Correlation

  15. What is correlation analysis?

    Quick definition: Correlation analysis, also known as bivariate, is primarily concerned with finding out whether a relationship exists between variables and then determining the magnitude and action of that relationship. Key takeaways: Correlation does not equal causation.

  16. The Correlation Coefficient: What It Is, What It Tells Investors

    The correlation coefficient is a statistical measure of the strength of a linear relationship between two variables. Its values can range from -1 to 1. A correlation coefficient of -1 describes...

  17. Correlation

    Correlation refers to a process for establishing the relationships between two variables. You learned a way to get a general idea about whether or not two variables are related, is to plot them on a " scatter plot ".

  18. What is Correlation Analysis?

    Correlation analysis is a statistical technique for determining the strength of a link between two variables. It is used to detect patterns and trends in data and to forecast future occurrences. Consider a problem with different factors to be considered for making optimal conclusions

  19. Correlation: Meaning, Significance, Types and Degree of Correlation

    According to A.M. Tuttle, "Correlation is an analysis of covariation between two or more variables." Two Variables are said to be Correlated if: The two variables are said to be correlated if a change in one causes a corresponding change in the other variable.

  20. Correlation and Regression

    Correlation Analysis Correlation analysis is applied in quantifying the association between two continuous variables, for example, an dependent and independent variable or among two independent variables. Regression Analysis Regression analysis refers to assessing the relationship between the outcome variable and one or more variables.

  21. Correlation: What It Means in Finance and the Formula ...

    Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. Correlations are used in advanced portfolio ...

  22. Correlation vs. Regression: What's the Difference?

    Regression is able to show a cause-and-effect relationship between two variables. Correlation does not do this. Regression is able to use an equation to predict the value of one variable, based on the value of another variable. Correlation does not does this. Regression uses an equation to quantify the relationship between two variables.

  23. Correlation Definition & Meaning

    noun cor· re· la· tion ˌkȯr-ə-ˈlā-shən ˌkär- Synonyms of correlation 1 : the state or relation of being correlated specifically : a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone

  24. Frontiers

    Correlation matrixes were constructed using R statistical software. Biplot analyses were made using XLSTAT, 2023 and dendrograms (Ward linkage, Pearson distance) were constructed using Minitab. Results Mean values of phenological and biochemical traits. The 'Chohara' cultivar showed the highest fruit weight (19.80 g) and pulp weight (18.26 g).

  25. The stability of cognitive abilities: A meta-analytic review of

    The meta-analysis relied on data from 205 longitudinal studies that involved a total of 87,408 participants, resulting in 1,288 test-retest correlation coefficients among manifest variables. For an age of 20 years and a test-retest interval of 5 years, we found a mean rank-order stability of ρ = .76.

  26. The causal correlation between gut microbiota abundance and

    The causal correlation between gut microbiota abundance and pathogenesis of cervical cancer: a bidirectional mendelian randomization study. ... (MR) analysis to explore whether there was a causal correlation between GM and CC, and the direction of causality. Results: In primary outcomes, we found that a higher abundance of class Clostridia ...

  27. Development of correlation for the power coefficient of the ...

    The values of the power coefficient obtained from the developed correlation and numerical analysis are compared and found that 95% of data points lie within ± 14% which shows the good agreement of predicted values with numerical values. The value of the regression coefficient (R 2) for developed correlation is obtained as 0.97. Moreover, the ...

  28. MRI-based tumor shrinkage patterns after early neoadjuvant therapy in

    Background MRI-based tumor shrinkage patterns (TSP) after neoadjuvant therapy (NAT) have been associated with pathological response. However, the understanding of TSP after early NAT remains limited. We aimed to analyze the relationship between TSP after early NAT and pathological response after therapy in different molecular subtypes. Methods We prospectively enrolled participants with ...