Imagine you are a clinical researcher evaluating a new therapy intervention for anxiety. You recruit patients from 15 different clinics across the country. Each clinic enrolls between 10 and 30 patients, and you measure their anxiety scores at baseline, 3 months, 6 months, and 12 months.
As you plan your analysis, you face a critical problem: patients from the same clinic are not independent. Clinics differ in their therapeutic culture, staff expertise, patient populations, and local resources. Patients within the same clinic are more similar to each other than to patients from different clinics. Standard regression assumes all observations are independent, but this assumption is violated when data are clustered (nested within groups) or repeated (multiple measurements from the same individuals).
If you ignore this clustering, you risk: - Underestimating standard errors: Your p-values will be too small, leading to false positives. - Ignoring meaningful variation: Differences between clinics may be scientifically important. - Misattributing effects: You might confuse clinic-level effects with patient-level effects.
Mixed effects models (also called multilevel models, hierarchical models, or random effects models) solve this problem. They partition variance into multiple levels—patients within clinics, measurements within patients—and estimate both fixed effects (average relationships across the population) and random effects (variation around those averages due to clustering).
Mixed effects models are essential for analyzing: - Clustered data: Students in schools, patients in hospitals, employees in companies. - Longitudinal data: Repeated measurements from the same individuals over time. - Complex designs: Cross-classified structures (e.g., students nested in both schools and neighborhoods).
This tutorial introduces mixed effects models using a behavioral dataset with patients nested in clinics.
Mixed effects models extend regression by allowing coefficients to vary across groups. They include:
A simple mixed effects model with random intercepts:
\[ Y_{ij} = \beta_0 + \beta_1 X_{ij} + u_j + \varepsilon_{ij} \]
where: - \(Y_{ij}\) is the outcome for patient \(i\) in clinic \(j\) - \(\beta_0\) is the fixed intercept (population average baseline) - \(\beta_1\) is the fixed slope (population average effect of \(X\)) - \(X_{ij}\) is a predictor (e.g., therapy sessions) - \(u_j\) is the random intercept for clinic \(j\) (deviation from \(\beta_0\)) - \(\varepsilon_{ij}\) is the residual error for patient \(i\) in clinic \(j\)
The random intercept \(u_j \sim N(0, \tau^2)\) allows each clinic to have its own baseline anxiety level. The variance \(\tau^2\) quantifies how much clinics differ.
We can also let slopes vary across groups:
\[ Y_{ij} = \beta_0 + \beta_1 X_{ij} + u_j + v_j X_{ij} + \varepsilon_{ij} \]
where \(v_j\) is the random slope for clinic \(j\), allowing the effect of therapy to differ across clinics.
Mixed effects models partition total variance into: - Between-group variance (\(\tau^2\)): How much outcomes differ across groups. - Within-group variance (\(\sigma^2\)): How much outcomes differ within groups.
The intraclass correlation coefficient (ICC) quantifies the proportion of variance due to clustering:
\[ \text{ICC} = \frac{\tau^2}{\tau^2 + \sigma^2} \]
An ICC of 0.10 means 10% of variance is between clinics and 90% is within clinics.
\[ \text{Anxiety}_{ij} = \beta_0 + \beta_1 \text{Therapy}_{ij} + u_j + \varepsilon_{ij} \]
\[ \text{Anxiety}_{ij} = \beta_0 + \beta_1 \text{Therapy}_{ij} + u_j + v_j \text{Therapy}_{ij} + \varepsilon_{ij} \]
We simulate data for 300 patients from 15 clinics with varying baseline anxiety and therapy effects.
set.seed(2025)
library(lme4)
## Loading required package: Matrix
library(ggplot2)
# Simulation parameters
n_clinics <- 15
patients_per_clinic <- 20
n_total <- n_clinics * patients_per_clinic
# Create clinic IDs
clinic <- rep(1:n_clinics, each = patients_per_clinic)
# Random effects for clinics
clinic_intercepts <- rnorm(n_clinics, mean = 0, sd = 5) # Baseline differences
clinic_slopes <- rnorm(n_clinics, mean = 0, sd = 0.3) # Therapy effect differences
# Patient-level predictors
therapy_sessions <- round(runif(n_total, min = 0, max = 20))
# Generate anxiety scores
# Population: Anxiety = 50 - 1.5*Therapy + clinic effects + error
anxiety <- numeric(n_total)
for (i in 1:n_total) {
clinic_id <- clinic[i]
anxiety[i] <- 50 + # Fixed intercept
clinic_intercepts[clinic_id] + # Random intercept
(-1.5 + clinic_slopes[clinic_id]) * therapy_sessions[i] + # Fixed + random slope
rnorm(1, mean = 0, sd = 6) # Residual error
}
# Ensure anxiety is in valid range
anxiety <- pmax(0, pmin(100, anxiety))
# Create data frame
data <- data.frame(
patient_id = 1:n_total,
clinic = factor(clinic),
therapy_sessions = therapy_sessions,
anxiety = anxiety
)
# View first rows
head(data, 12)
## patient_id clinic therapy_sessions anxiety
## 1 1 1 3 51.67108
## 2 2 1 15 35.04482
## 3 3 1 19 32.15488
## 4 4 1 4 50.58195
## 5 5 1 2 40.88176
## 6 6 1 15 23.30940
## 7 7 1 14 35.10353
## 8 8 1 7 44.77209
## 9 9 1 5 50.36949
## 10 10 1 4 45.18165
## 11 11 1 2 52.50908
## 12 12 1 20 25.98565
# Summary by clinic
aggregate(anxiety ~ clinic, data = data, FUN = function(x) c(mean = mean(x), sd = sd(x)))
## clinic anxiety.mean anxiety.sd
## 1 1 40.189096 8.943037
## 2 2 33.780348 10.226558
## 3 3 37.268670 8.001161
## 4 4 37.558202 11.062779
## 5 5 43.788916 9.665642
## 6 6 33.820355 8.413072
## 7 7 37.930435 13.046259
## 8 8 41.485210 7.572337
## 9 9 38.804121 10.538637
## 10 10 41.712412 10.139023
## 11 11 28.873105 13.688091
## 12 12 21.913045 11.137794
## 13 13 35.477303 11.864484
## 14 14 36.877748 10.410038
## 15 15 40.776249 12.781180
This dataset has patients nested in clinics, with both random intercepts (baseline anxiety varies) and random slopes (therapy effectiveness varies).
Before fitting models, we visualize the clustering structure.
ggplot(data, aes(x = therapy_sessions, y = anxiety, color = clinic, group = clinic)) +
geom_point(alpha = 0.6, size = 1.5) +
geom_smooth(method = "lm", se = FALSE, size = 0.8) +
labs(title = "Anxiety by Therapy Sessions Across Clinics",
x = "Number of Therapy Sessions",
y = "Anxiety Score",
color = "Clinic") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## `geom_smooth()` using formula = 'y ~ x'
Each clinic has its own regression line, showing heterogeneity in both intercepts and slopes.
clinic_means <- aggregate(anxiety ~ clinic, data = data, FUN = mean)
ggplot(clinic_means, aes(x = clinic, y = anxiety)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) +
geom_hline(yintercept = mean(data$anxiety), linetype = "dashed", color = "red", size = 1) +
labs(title = "Mean Anxiety by Clinic",
x = "Clinic",
y = "Mean Anxiety Score") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The red dashed line shows the overall mean. Clinics vary substantially in average anxiety.
First, we fit a standard regression that ignores clustering:
model_standard <- lm(anxiety ~ therapy_sessions, data = data)
summary(model_standard)
##
## Call:
## lm(formula = anxiety ~ therapy_sessions, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.6977 -5.1920 0.6025 5.2024 24.6479
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.47281 0.91674 55.06 <2e-16 ***
## therapy_sessions -1.36616 0.07773 -17.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.215 on 298 degrees of freedom
## Multiple R-squared: 0.509, Adjusted R-squared: 0.5074
## F-statistic: 308.9 on 1 and 298 DF, p-value: < 2.2e-16
This model assumes all patients are independent, which violates the clustered structure. Standard errors are likely too small.
Now we fit a mixed effects model with random intercepts for clinics:
library(lme4)
model_ri <- lmer(anxiety ~ therapy_sessions + (1 | clinic), data = data)
summary(model_ri)
## Linear mixed model fit by REML ['lmerMod']
## Formula: anxiety ~ therapy_sessions + (1 | clinic)
## Data: data
##
## REML criterion at convergence: 1987.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.69379 -0.60026 0.03691 0.62603 2.85156
##
## Random effects:
## Groups Name Variance Std.Dev.
## clinic (Intercept) 31.08 5.575
## Residual 38.37 6.194
## Number of obs: 300, groups: clinic, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.94217 1.60141 31.81
## therapy_sessions -1.41266 0.05985 -23.61
##
## Correlation of Fixed Effects:
## (Intr)
## thrpy_sssns -0.377
The syntax (1 | clinic) specifies random intercepts for
clinics. The model estimates: - Fixed effects: \(\beta_0\) (intercept) and \(\beta_1\) (therapy effect) - Random
effects: \(\tau^2\) (variance of clinic
intercepts) and \(\sigma^2\) (residual
variance)
The random intercept model output includes:
Fixed effects: - Intercept: The estimated average anxiety score when therapy sessions = 0, across all clinics. - therapy_sessions: The estimated average decrease in anxiety per therapy session, holding clinic constant.
Random effects: - clinic (Intercept): Variance of clinic-specific intercepts (\(\tau^2\)). This quantifies how much baseline anxiety varies across clinics. - Residual: Within-clinic variance (\(\sigma^2\)). This quantifies variation among patients within the same clinic.
Intraclass Correlation (ICC):
We calculate the ICC to see what proportion of variance is due to clinic differences:
var_clinic <- as.data.frame(VarCorr(model_ri))$vcov[1]
var_residual <- as.data.frame(VarCorr(model_ri))$vcov[2]
ICC <- var_clinic / (var_clinic + var_residual)
cat("Between-clinic variance (tau^2):", round(var_clinic, 2), "\n")
## Between-clinic variance (tau^2): 31.08
cat("Within-clinic variance (sigma^2):", round(var_residual, 2), "\n")
## Within-clinic variance (sigma^2): 38.37
cat("ICC:", round(ICC, 3), "\n")
## ICC: 0.448
The output shows: - Between-clinic variance: \(\tau^2 = 31.08\) - Within-clinic variance:
\(\sigma^2 = 38.37\)
- ICC = 0.448
An ICC of 0.448 means 44.8% of total variance in anxiety is due to differences between clinics.
Interpretation: The random intercept model accounts for baseline differences across clinics. Patients from the same clinic share more similar anxiety levels than patients from different clinics. The therapy effect is assumed to be the same across all clinics.
Next, we allow the effect of therapy to vary across clinics by adding random slopes:
model_rs <- lmer(anxiety ~ therapy_sessions + (therapy_sessions | clinic), data = data)
summary(model_rs)
## Linear mixed model fit by REML ['lmerMod']
## Formula: anxiety ~ therapy_sessions + (therapy_sessions | clinic)
## Data: data
##
## REML criterion at convergence: 1973.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.82413 -0.64670 0.02686 0.62713 2.76382
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## clinic (Intercept) 11.88699 3.4478
## therapy_sessions 0.07627 0.2762 0.55
## Residual 35.84954 5.9874
## Number of obs: 300, groups: clinic, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 51.01660 1.11946 45.57
## therapy_sessions -1.42970 0.09193 -15.55
##
## Correlation of Fixed Effects:
## (Intr)
## thrpy_sssns 0.012
The syntax (therapy_sessions | clinic) specifies both
random intercepts and random slopes for therapy sessions.
This model estimates: - Variance of random intercepts - Variance of random slopes - Correlation between random intercepts and slopes
Fixed effects: Same as before—average intercept and average therapy effect.
Random effects: - clinic (Intercept): Variance in baseline anxiety across clinics. - clinic therapy_sessions: Variance in therapy effectiveness across clinics. - Correlation: If negative, clinics with higher baseline anxiety show stronger therapy effects. If positive, clinics with higher anxiety show weaker effects.
Comparing models:
We use a likelihood ratio test to compare the random intercept and random slopes models:
anova(model_ri, model_rs)
## refitting model(s) with ML (instead of REML)
## Data: data
## Models:
## model_ri: anxiety ~ therapy_sessions + (1 | clinic)
## model_rs: anxiety ~ therapy_sessions + (therapy_sessions | clinic)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model_ri 4 1994.2 2009.0 -993.08 1986.2
## model_rs 6 1984.2 2006.4 -986.08 1972.2 13.998 2 0.0009127 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If the random slopes model fits significantly better (\(p < .05\)), we conclude that therapy effectiveness varies meaningfully across clinics.
Extracting clinic-specific effects:
clinic_effects <- ranef(model_rs)$clinic
head(clinic_effects)
## (Intercept) therapy_sessions
## 1 1.8554323 0.134844749
## 2 -1.5784596 -0.009734087
## 3 0.5502901 0.151180047
## 4 2.0191222 -0.004771978
## 5 4.0027522 0.317330832
## 6 -2.7781210 -0.020098889
Each clinic has its own intercept and slope deviations. A clinic with a negative slope deviation has a stronger therapy effect than average.
We can visualize the clinic-specific intercepts and slopes:
clinic_effects$clinic_id <- rownames(clinic_effects)
names(clinic_effects) <- c("Intercept", "Slope", "clinic_id")
ggplot(clinic_effects, aes(x = reorder(clinic_id, Intercept), y = Intercept)) +
geom_bar(stat = "identity", fill = "coral", alpha = 0.7) +
geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
labs(title = "Random Intercepts by Clinic",
x = "Clinic",
y = "Deviation from Average Baseline Anxiety") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
Clinics above zero have higher-than-average baseline anxiety; those below zero have lower baseline anxiety.
ggplot(clinic_effects, aes(x = reorder(clinic_id, Slope), y = Slope)) +
geom_bar(stat = "identity", fill = "darkgreen", alpha = 0.7) +
geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
labs(title = "Random Slopes by Clinic",
x = "Clinic",
y = "Deviation from Average Therapy Effect") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
Negative slopes indicate stronger-than-average therapy effects; positive slopes indicate weaker effects.
Mixed effects models assume: - Normality of random effects - Normality of residuals - Homoscedasticity of residuals
qqnorm(clinic_effects$Intercept, main = "Q-Q Plot: Random Intercepts")
qqline(clinic_effects$Intercept, col = "red")
Random intercepts should follow a normal distribution.
data$residuals <- residuals(model_rs)
data$fitted <- fitted(model_rs)
ggplot(data, aes(x = fitted, y = residuals)) +
geom_point(alpha = 0.6, color = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs Fitted Values",
x = "Fitted Values",
y = "Residuals") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
Residuals should scatter randomly around zero with constant variance.
A mixed effects model with random intercepts for clinics was fitted to examine the effect of therapy sessions on anxiety scores. The model included 300 patients nested within 15 clinics. Fixed effects revealed that therapy sessions significantly reduced anxiety, \(b = -1.41\), \(t = -23.61\), \(p < .001\). Each additional therapy session was associated with a decrease of 1.41 points in anxiety, on average. Random effects showed substantial between-clinic variance in baseline anxiety, \(\tau^2 = 31.08\), indicating that clinics differed in their average anxiety levels. The intraclass correlation coefficient (ICC) was 0.448, meaning 44.8% of total variance was attributable to clinic-level differences. This clustering necessitates mixed effects modeling to avoid underestimating standard errors.
A random slopes model allowed both baseline anxiety and therapy effectiveness to vary across clinics. The model provided a significantly better fit than the random intercept model, \(\chi^2(2) = 14.00\), \(p < .001\), indicating that therapy effectiveness varied meaningfully across clinics. Fixed effects remained significant, with therapy reducing anxiety by an average of 1.43 points per session, \(t = -15.55\), \(p < .001\). Random effects showed variance in both intercepts (\(\tau_0^2 = 11.89\)) and slopes (\(\tau_1^2 = 0.076\)), with a correlation of \(r = 0.55\) between them. This positive correlation suggests that clinics with higher baseline anxiety also showed stronger therapy effects. Clinic-specific slope deviations ranged from -0.49 to +0.41, highlighting heterogeneity in treatment response. These findings suggest that therapy is effective on average but that some clinics achieve stronger reductions in anxiety than others, possibly due to differences in staff training, therapeutic environment, or patient characteristics.
Use mixed effects models when: - Data are clustered: Observations are grouped (students in schools, patients in clinics, employees in companies). - Data are longitudinal: Repeated measurements from the same individuals. - Ignoring clustering is problematic: ICC > 0.05 suggests meaningful clustering.
Mixed effects models require sufficient groups (clusters) to estimate random effects reliably. Aim for at least 10-15 groups. With fewer groups, random effect estimates may be unstable.
Complex random effects structures (e.g., random slopes with many
predictors) can cause convergence problems. If models fail to converge:
- Simplify the random effects structure. - Center or scale predictors. -
Use different optimizers (e.g., bobyqa,
nloptwrap).
Mixed effects models can be extended to: - Generalized linear
mixed models (GLMMs): For binary, count, or other non-normal
outcomes (using glmer()). - Longitudinal
models: Repeated measures over time with time-varying
predictors. - Cross-classified models: Multiple
clustering variables (e.g., students nested in both schools and
neighborhoods). - Three-level models: More complex
hierarchies (e.g., students in classrooms in schools).
Mixed effects models are essential for analyzing clustered and longitudinal data. They account for dependencies among observations by partitioning variance into multiple levels and estimating both population-average effects (fixed effects) and group-specific deviations (random effects).
By modeling clustering explicitly, mixed effects models: - Provide accurate standard errors and valid p-values - Reveal variation across groups (clinics, schools, individuals) - Allow group-specific predictions - Support multilevel research questions about both individuals and groups
Ignoring clustering leads to biased inferences, inflated Type I error rates, and missed opportunities to understand heterogeneity. Mixed effects models address these issues while maintaining the interpretability of regression.
In behavioral and health research, data are almost always clustered—patients within clinics, students within schools, repeated measures within individuals. Mixed effects models provide the proper framework for analyzing such data, respecting the hierarchical structure while estimating meaningful effects.
With mixed effects models, we can answer richer questions: Does therapy reduce anxiety on average? Yes. But does it work equally well in all clinics? No—some clinics show stronger effects than others, and understanding why is the next scientific frontier.
By embracing the complexity of real-world data structures, mixed effects models move psychological and health research closer to the nuanced, context-dependent understanding that best serves science and practice.