Imagine you are a health psychologist investigating why therapy reduces depression. You observe that patients who complete more therapy sessions show greater reductions in depression symptoms. But you wonder: How does therapy work? What is the mechanism?
You hypothesize that therapy reduces depression by increasing coping skills. Patients learn cognitive and behavioral strategies that help them manage stress and negative thoughts. If this is true, coping skills act as a mediator: therapy improves coping, which in turn reduces depression. The effect of therapy on depression is transmitted through coping skills.
But you also notice something else: the effect of therapy varies across patients. Some patients benefit greatly, while others show minimal improvement. You suspect that social support moderates the therapy effect. Patients with strong social networks may benefit more from therapy because they can practice new skills with supportive friends and family. Social support acts as a moderator: it changes the strength of the relationship between therapy and depression.
These questions—how does X affect Y (mediation) and for whom or when does X affect Y (moderation)—are central to understanding complex behavioral processes. Mediation and moderation analysis provide the statistical tools to answer them.
Mediation and moderation are two distinct types of relationships that go beyond simple bivariate associations.
Mediation occurs when the effect of an independent variable \(X\) (e.g., therapy) on a dependent variable \(Y\) (e.g., depression) is transmitted through a third variable \(M\) (e.g., coping skills), called the mediator. The mediator explains how or why \(X\) affects \(Y\).
The mediation model involves three key paths:
The indirect effect (mediation effect) is the product of paths \(a\) and \(b\): \(a \times b\). If the indirect effect is significant, we conclude that \(M\) mediates the relationship between \(X\) and \(Y\).
Full mediation occurs when the direct effect \(c'\) becomes non-significant after including the mediator, meaning all of \(X\)’s effect on \(Y\) is transmitted through \(M\). Partial mediation occurs when \(c'\) remains significant but is reduced, meaning \(M\) explains some, but not all, of the effect.
Moderation occurs when the relationship between \(X\) and \(Y\) depends on the level of a third variable \(W\) (e.g., social support), called the moderator. The moderator answers the question: For whom or under what conditions does \(X\) affect \(Y\)?
Moderation is tested by including an interaction term \(X \times W\) in the regression model. If the interaction is significant, the effect of \(X\) on \(Y\) varies across levels of \(W\).
For example, if social support moderates the therapy-depression relationship, therapy may be highly effective for patients with strong support but less effective for those with weak support.
Both can be tested within regression frameworks, making them accessible extensions of multiple regression.
The mediation model consists of three regression equations:
Equation 1 (Total effect of X on Y):
\[ Y_i = c_0 + c \cdot X_i + \varepsilon_i \]
Equation 2 (Effect of X on M):
\[ M_i = a_0 + a \cdot X_i + \varepsilon_{Mi} \]
Equation 3 (Effect of M on Y, controlling for X):
\[ Y_i = b_0 + c' \cdot X_i + b \cdot M_i + \varepsilon_{Yi} \]
where: - \(c\) is the total effect of \(X\) on \(Y\) - \(a\) is the effect of \(X\) on \(M\) - \(b\) is the effect of \(M\) on \(Y\), controlling for \(X\) - \(c'\) is the direct effect of \(X\) on \(Y\), controlling for \(M\) - Indirect effect = \(a \times b\) - Total effect = \(c' + (a \times b)\)
The moderation model includes an interaction term:
\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_i + \beta_3 (X_i \times W_i) + \varepsilon_i \]
where: - \(\beta_1\) is the effect of \(X\) when \(W = 0\) (simple slope at \(W = 0\)) - \(\beta_2\) is the effect of \(W\) when \(X = 0\) - \(\beta_3\) is the interaction coefficient, indicating how much the effect of \(X\) changes for each unit increase in \(W\) - The effect of \(X\) at any level of \(W\) is: \(\beta_1 + \beta_3 W\)
If \(\beta_3\) is significant, the relationship between \(X\) and \(Y\) varies across levels of \(W\), confirming moderation.
We simulate data for 300 patients in a therapy study. Variables include:
We build in both mediation and moderation: - Therapy increases coping (\(a\) path) - Coping reduces depression (\(b\) path) - Social support moderates the therapy-depression relationship
set.seed(2024) # For reproducibility
library(ggplot2)
library(mediation) # For mediation analysis
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.1
n <- 300
# Predictor: therapy sessions (0-20)
therapy <- round(runif(n, min = 0, max = 20))
# Moderator: social support (0-50)
social_support <- rnorm(n, mean = 25, sd = 10)
social_support <- pmax(0, pmin(50, social_support))
# Mediator: coping skills (influenced by therapy)
coping <- 30 + 2.5 * therapy + rnorm(n, mean = 0, sd = 8)
coping <- pmax(0, pmin(100, coping))
# Outcome: depression (influenced by therapy, coping, and moderated by social support)
# Direct effect of therapy: -1.5
# Effect of coping: -0.4
# Interaction between therapy and social support: -0.08
depression <- 80 - 1.5 * therapy - 0.4 * coping - 0.08 * (therapy * social_support) / 10 + rnorm(n, mean = 0, sd = 10)
depression <- pmax(0, pmin(100, depression))
# Create data frame
data <- data.frame(
therapy = therapy,
social_support = social_support,
coping = coping,
depression = depression
)
# View first few rows
head(data)
## therapy social_support coping depression
## 1 17 13.62561 63.48132 25.07833
## 2 6 24.56343 52.13987 55.32659
## 3 14 48.93530 58.14764 49.54092
## 4 14 37.44440 46.87660 35.73871
## 5 9 17.32787 58.63098 52.85470
## 6 14 0.00000 72.70302 25.90487
This dataset has built-in mediation (therapy → coping → depression) and moderation (therapy × social support → depression).
Before fitting models, we visualize the relationships between variables.
ggplot(data, aes(x = therapy, y = depression)) +
geom_point(alpha = 0.5, size = 2, color = "steelblue") +
geom_smooth(method = "lm", se = TRUE, color = "darkred", fill = "pink", alpha = 0.3) +
labs(title = "Therapy Sessions and Depression",
x = "Number of Therapy Sessions",
y = "Depression Score") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
## `geom_smooth()` using formula = 'y ~ x'
This shows a negative relationship: more therapy sessions are associated with lower depression.
ggplot(data, aes(x = therapy, y = coping)) +
geom_point(alpha = 0.5, size = 2, color = "forestgreen") +
geom_smooth(method = "lm", se = TRUE, color = "darkgreen", fill = "lightgreen", alpha = 0.3) +
labs(title = "Therapy Sessions and Coping Skills",
x = "Number of Therapy Sessions",
y = "Coping Skills Score") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
## `geom_smooth()` using formula = 'y ~ x'
Therapy increases coping skills, consistent with the mediation hypothesis.
We test mediation using the Baron and Kenny (1986) steps and the
Sobel test, then use modern bootstrapping methods via the
mediation package.
# Model 1: Total effect of therapy on depression
model_total <- lm(depression ~ therapy, data = data)
summary(model_total)
##
## Call:
## lm(formula = depression ~ therapy, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.876 -6.537 -0.159 7.327 30.438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.5000 1.1490 58.75 <2e-16 ***
## therapy -2.7012 0.1008 -26.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.56 on 298 degrees of freedom
## Multiple R-squared: 0.7066, Adjusted R-squared: 0.7056
## F-statistic: 717.7 on 1 and 298 DF, p-value: < 2.2e-16
# Model 2: Effect of therapy on coping
model_a <- lm(coping ~ therapy, data = data)
summary(model_a)
##
## Call:
## lm(formula = coping ~ therapy, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.1945 -5.3702 -0.6279 5.3565 23.9566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.20181 0.82481 35.40 <2e-16 ***
## therapy 2.54141 0.07238 35.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.578 on 298 degrees of freedom
## Multiple R-squared: 0.8053, Adjusted R-squared: 0.8047
## F-statistic: 1233 on 1 and 298 DF, p-value: < 2.2e-16
# Model 3: Effect of coping on depression, controlling for therapy
model_b <- lm(depression ~ therapy + coping, data = data)
summary(model_b)
##
## Call:
## lm(formula = depression ~ therapy + coping, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.5251 -7.0526 -0.9486 7.0191 31.0872
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 79.47973 2.50968 31.669 < 2e-16 ***
## therapy -1.65863 0.21877 -7.582 4.39e-13 ***
## coping -0.41024 0.07725 -5.311 2.15e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.11 on 297 degrees of freedom
## Multiple R-squared: 0.732, Adjusted R-squared: 0.7302
## F-statistic: 405.7 on 2 and 297 DF, p-value: < 2.2e-16
The indirect effect is \(a \times b\). We extract coefficients and compute it manually:
a_coef <- coef(model_a)["therapy"]
b_coef <- coef(model_b)["coping"]
c_coef <- coef(model_total)["therapy"]
c_prime_coef <- coef(model_b)["therapy"]
indirect_effect <- a_coef * b_coef
total_effect <- c_coef
direct_effect <- c_prime_coef
cat("Total effect (c):", round(total_effect, 3), "\n")
## Total effect (c): -2.701
cat("Direct effect (c'):", round(direct_effect, 3), "\n")
## Direct effect (c'): -1.659
cat("Indirect effect (a × b):", round(indirect_effect, 3), "\n")
## Indirect effect (a × b): -1.043
cat("Proportion mediated:", round(indirect_effect / total_effect, 3), "\n")
## Proportion mediated: 0.386
For robust inference, we use bootstrapping via the
mediation package:
# Fit mediation and outcome models
med_model <- lm(coping ~ therapy, data = data)
out_model <- lm(depression ~ therapy + coping, data = data)
# Bootstrap mediation analysis
set.seed(2024)
med_results <- mediate(med_model, out_model, treat = "therapy", mediator = "coping", boot = TRUE, sims = 1000)
## Running nonparametric bootstrap
summary(med_results)
##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME -1.04258 -1.45758 -0.65916 < 2.2e-16 ***
## ADE -1.65863 -2.11490 -1.20801 < 2.2e-16 ***
## Total Effect -2.70121 -2.88718 -2.50278 < 2.2e-16 ***
## Prop. Mediated 0.38597 0.24019 0.54130 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 300
##
##
## Simulations: 1000
The output provides the average causal mediation effect (ACME), average direct effect (ADE), total effect, and proportion mediated, along with confidence intervals.
The mediation analysis reveals how therapy reduces depression through coping skills.
Path a (therapy → coping): Therapy significantly increases coping skills, \(b = 2.54\), \(t(298) = 35.11\), \(p < .001\). Each additional therapy session increases coping by approximately 2.54 points.
Path b (coping → depression, controlling for therapy): Coping skills significantly reduce depression, \(b = -0.41\), \(t(297) = -5.31\), \(p < .001\). Each one-point increase in coping reduces depression by approximately 0.41 points, holding therapy constant.
Direct effect (c’): After controlling for coping, therapy still has a direct effect on depression, \(b = -1.66\), \(t(297) = -7.58\), \(p < .001\). This suggests partial mediation: coping explains part of therapy’s effect, but not all.
Indirect effect (a × b): The mediated effect through coping is \(-1.04\) points per therapy session (95% CI \([-1.45, -0.66]\)). This represents the portion of therapy’s effect that operates through improved coping.
Proportion mediated: Coping accounts for approximately 38.6% of the total effect of therapy on depression (95% CI \([23.9\%, 54.0\%]\)).
Interpretation: Therapy reduces depression both by directly addressing symptoms and by teaching coping skills that patients use to manage stress and negative thoughts. Coping is a significant but partial mediator.
We test whether social support moderates the therapy-depression relationship by adding an interaction term.
To reduce multicollinearity and ease interpretation, we mean-center the predictors:
data$therapy_c <- data$therapy - mean(data$therapy)
data$support_c <- data$social_support - mean(data$social_support)
# Moderation model with interaction
model_moderation <- lm(depression ~ therapy_c * support_c, data = data)
summary(model_moderation)
##
## Call:
## lm(formula = depression ~ therapy_c * support_c, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.880 -6.749 -0.323 7.172 30.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.27943 0.62346 66.210 <2e-16 ***
## therapy_c -2.69793 0.10292 -26.213 <2e-16 ***
## support_c 0.02751 0.06585 0.418 0.676
## therapy_c:support_c -0.01137 0.01135 -1.001 0.318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.57 on 296 degrees of freedom
## Multiple R-squared: 0.7076, Adjusted R-squared: 0.7047
## F-statistic: 238.8 on 3 and 296 DF, p-value: < 2.2e-16
The key coefficient is the interaction term
therapy_c:support_c. If significant, it confirms
moderation.
The moderation analysis tests whether the effect of therapy varies by social support level.
Main effect of therapy: The coefficient for
therapy_c represents the effect of therapy when social
support is at its mean, \(b = -2.70\),
\(t(296) = -26.21\), \(p < .001\). Therapy reduces depression
by approximately 2.70 points per session at average support levels.
Main effect of social support: Social support shows a small positive (but non-significant) effect, \(b = 0.03\), \(t(296) = 0.42\), \(p = .676\). This suggests social support’s primary role is as a moderator rather than a direct predictor.
Interaction effect: The coefficient for
therapy_c:support_c is \(b =
-0.01\), \(t(296) = -1.00\),
\(p = .318\). Although not
statistically significant in this sample, the negative coefficient
suggests a trend where therapy becomes slightly more effective as social
support increases. This means the effect of therapy changes by
approximately 0.01 points for each one-unit increase in social
support.
Simple slopes: To understand the interaction, we calculate the effect of therapy at different levels of social support (e.g., -1 SD, mean, +1 SD):
# Calculate simple slopes at different levels of social support
sd_support <- sd(data$social_support)
mean_support <- mean(data$social_support)
# Low support (-1 SD)
low_support <- mean_support - sd_support
slope_low <- coef(model_moderation)["therapy_c"] + coef(model_moderation)["therapy_c:support_c"] * (low_support - mean_support)
# Average support
slope_avg <- coef(model_moderation)["therapy_c"]
# High support (+1 SD)
high_support <- mean_support + sd_support
slope_high <- coef(model_moderation)["therapy_c"] + coef(model_moderation)["therapy_c:support_c"] * (high_support - mean_support)
cat("Effect of therapy at low support (-1 SD):", round(slope_low, 3), "\n")
## Effect of therapy at low support (-1 SD): -2.589
cat("Effect of therapy at average support:", round(slope_avg, 3), "\n")
## Effect of therapy at average support: -2.698
cat("Effect of therapy at high support (+1 SD):", round(slope_high, 3), "\n")
## Effect of therapy at high support (+1 SD): -2.807
The simple slopes reveal: - At low social support (-1 SD): \(b = -2.59\) - At average social support: \(b = -2.70\) - At high social support (+1 SD): \(b = -2.81\)
Interpretation: The slopes show a modest trend where therapy effectiveness increases slightly with higher social support (from -2.59 to -2.81 points per session). However, the interaction is not statistically significant (\(p = .318\)), suggesting that in this sample, therapy is effective across all levels of social support. Larger samples might detect this moderating effect more clearly.
We visualize the interaction by plotting the therapy-depression relationship at different levels of social support.
# Create prediction data for low, average, and high social support
pred_data <- expand.grid(
therapy_c = seq(min(data$therapy_c), max(data$therapy_c), length.out = 100),
support_c = c(low_support - mean_support, 0, high_support - mean_support)
)
pred_data$depression_pred <- predict(model_moderation, newdata = pred_data)
# Convert back to original scale for plotting
pred_data$therapy <- pred_data$therapy_c + mean(data$therapy)
pred_data$support_level <- factor(
pred_data$support_c,
levels = c(low_support - mean_support, 0, high_support - mean_support),
labels = c("Low Support (-1 SD)", "Average Support", "High Support (+1 SD)")
)
# Plot interaction
ggplot(pred_data, aes(x = therapy, y = depression_pred, color = support_level, linetype = support_level)) +
geom_line(size = 1.2) +
labs(title = "Moderation: Therapy Effect Varies by Social Support",
x = "Number of Therapy Sessions",
y = "Predicted Depression Score",
color = "Social Support Level",
linetype = "Social Support Level") +
scale_color_manual(values = c("red", "blue", "darkgreen")) +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "bottom"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
The plot shows three lines with different slopes, illustrating how the therapy effect strengthens as social support increases.
Both mediation and moderation models rely on standard regression assumptions.
data$residuals_mod <- residuals(model_moderation)
data$fitted_mod <- fitted(model_moderation)
ggplot(data, aes(x = fitted_mod, y = residuals_mod)) +
geom_point(alpha = 0.6, color = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs Fitted Values (Moderation Model)",
x = "Fitted Values",
y = "Residuals") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
Residuals should scatter randomly around zero with no patterns.
Mediation analysis assumes: - No unmeasured confounding of the X-M, M-Y, or X-Y relationships - Correct temporal ordering (X precedes M precedes Y) - No measurement error in M
Violations can bias estimates. Sensitivity analyses and instrumental variables can help assess robustness.
A mediation analysis was conducted to examine whether coping skills mediate the relationship between therapy sessions and depression symptoms in a sample of 300 patients. A bootstrapped mediation analysis (1000 iterations) revealed a significant indirect effect of therapy on depression through coping, \(b = -1.04\), 95% CI \([-1.45, -0.66]\), \(p < .001\). The direct effect of therapy on depression remained significant after controlling for coping, \(b = -1.66\), \(t(297) = -7.58\), \(p < .001\), indicating partial mediation. Coping skills accounted for approximately 38.6% (95% CI \([23.9\%, 54.0\%]\)) of the total effect of therapy on depression. These findings suggest that therapy reduces depression both directly and by enhancing patients’ coping strategies.
A moderation analysis tested whether social support moderates the effect of therapy on depression. The interaction between therapy sessions and social support showed a non-significant trend, \(\beta = -0.01\), \(t(296) = -1.00\), \(p = .318\), \(R^2 = .708\). Simple slopes analysis revealed that therapy was effective across all levels of social support: at low support (\(b = -2.59\), \(p < .001\)), average support (\(b = -2.70\), \(p < .001\)), and high support (\(b = -2.81\), \(p < .001\)). While the interaction was not statistically significant, the pattern suggests therapy may be slightly more effective for patients with stronger social networks. These results indicate that therapy is broadly effective, though larger samples may be needed to detect moderating effects of social support.
Mediation and moderation analyses require adequate power. For mediation, aim for \(n \geq 200\) to detect moderate indirect effects with bootstrapping. For moderation, detecting small interactions requires \(n \geq 400\). Underpowered studies may miss true effects or yield unstable estimates.
Mean-centering continuous predictors before creating interaction terms reduces multicollinearity and makes main effects more interpretable (they represent effects at the mean of the moderator). Standardizing variables can also aid interpretation when variables are on different scales.
When multiple mediators are hypothesized, fit separate models for each mediator and use structural equation modeling (SEM) for simultaneous testing of complex mediation chains (e.g., serial mediation, parallel mediation).
The Baron and Kenny (1986) approach is widely used but has limitations. Modern methods prefer: - Bootstrapping: Provides robust confidence intervals without normality assumptions. - Structural equation modeling (SEM): Allows testing complex mediation models with multiple mediators and outcomes. - Causal mediation analysis: Addresses confounding and provides causal interpretations under certain assumptions.
For significant interactions, always probe simple slopes at meaningful levels of the moderator (e.g., -1 SD, mean, +1 SD) and visualize the interaction. Johnson-Neyman technique can identify the exact moderator values where the effect of \(X\) becomes significant or non-significant.
Neither mediation nor moderation analysis establishes causality from observational data. Randomized experiments, longitudinal designs, and careful control of confounders strengthen causal claims. Sensitivity analyses can assess how robust conclusions are to unmeasured confounding.
Mediation and moderation analyses extend multiple regression to answer more nuanced questions about relationships between variables. Mediation uncovers how effects operate by identifying mechanisms, while moderation reveals for whom or when effects occur by identifying boundary conditions.
Both methods are widely used in psychology, health sciences, and behavioral research to develop and test theories about processes and contexts. They move beyond simple “does X affect Y?” questions to richer inquiries about psychological and behavioral mechanisms.
Mediation helps identify intervention targets (e.g., if coping mediates therapy’s effect, programs can explicitly teach coping skills). Moderation helps personalize interventions (e.g., if social support moderates therapy, clinicians can tailor treatment intensity or add support-building components).
When combined, mediation and moderation provide a powerful framework for understanding the complexity of human behavior, guiding theory development, and informing evidence-based practice.
With careful application, these techniques deepen our understanding of why and when interventions work, advancing both science and clinical practice.