Back to Tutorials Download .Rmd

1. A behavioral story

Imagine you are a clinical researcher evaluating a new therapy intervention for anxiety. You recruit patients from 15 different clinics across the country. Each clinic enrolls between 10 and 30 patients, and you measure their anxiety scores at baseline, 3 months, 6 months, and 12 months.

As you plan your analysis, you face a critical problem: patients from the same clinic are not independent. Clinics differ in their therapeutic culture, staff expertise, patient populations, and local resources. Patients within the same clinic are more similar to each other than to patients from different clinics. Standard regression assumes all observations are independent, but this assumption is violated when data are clustered (nested within groups) or repeated (multiple measurements from the same individuals).

If you ignore this clustering, you risk: - Underestimating standard errors: Your p-values will be too small, leading to false positives. - Ignoring meaningful variation: Differences between clinics may be scientifically important. - Misattributing effects: You might confuse clinic-level effects with patient-level effects.

Mixed effects models (also called multilevel models, hierarchical models, or random effects models) solve this problem. They partition variance into multiple levels—patients within clinics, measurements within patients—and estimate both fixed effects (average relationships across the population) and random effects (variation around those averages due to clustering).

Mixed effects models are essential for analyzing: - Clustered data: Students in schools, patients in hospitals, employees in companies. - Longitudinal data: Repeated measurements from the same individuals over time. - Complex designs: Cross-classified structures (e.g., students nested in both schools and neighborhoods).

This tutorial introduces mixed effects models using a behavioral dataset with patients nested in clinics.

2. What are mixed effects models?

Mixed effects models extend regression by allowing coefficients to vary across groups. They include:

  • Fixed effects: Parameters that are constant across all groups (e.g., the average effect of therapy on anxiety).
  • Random effects: Parameters that vary across groups (e.g., each clinic has its own baseline anxiety level or therapy effect).

The basic structure

A simple mixed effects model with random intercepts:

\[ Y_{ij} = \beta_0 + \beta_1 X_{ij} + u_j + \varepsilon_{ij} \]

where: - \(Y_{ij}\) is the outcome for patient \(i\) in clinic \(j\) - \(\beta_0\) is the fixed intercept (population average baseline) - \(\beta_1\) is the fixed slope (population average effect of \(X\)) - \(X_{ij}\) is a predictor (e.g., therapy sessions) - \(u_j\) is the random intercept for clinic \(j\) (deviation from \(\beta_0\)) - \(\varepsilon_{ij}\) is the residual error for patient \(i\) in clinic \(j\)

The random intercept \(u_j \sim N(0, \tau^2)\) allows each clinic to have its own baseline anxiety level. The variance \(\tau^2\) quantifies how much clinics differ.

Random slopes

We can also let slopes vary across groups:

\[ Y_{ij} = \beta_0 + \beta_1 X_{ij} + u_j + v_j X_{ij} + \varepsilon_{ij} \]

where \(v_j\) is the random slope for clinic \(j\), allowing the effect of therapy to differ across clinics.

Variance partitioning

Mixed effects models partition total variance into: - Between-group variance (\(\tau^2\)): How much outcomes differ across groups. - Within-group variance (\(\sigma^2\)): How much outcomes differ within groups.

The intraclass correlation coefficient (ICC) quantifies the proportion of variance due to clustering:

\[ \text{ICC} = \frac{\tau^2}{\tau^2 + \sigma^2} \]

An ICC of 0.10 means 10% of variance is between clinics and 90% is within clinics.

3. The models in plain form

Random intercept model

\[ \text{Anxiety}_{ij} = \beta_0 + \beta_1 \text{Therapy}_{ij} + u_j + \varepsilon_{ij} \]

  • \(\beta_0\): Average baseline anxiety across all clinics
  • \(\beta_1\): Average effect of therapy sessions on anxiety
  • \(u_j \sim N(0, \tau^2)\): Clinic-specific deviation from average baseline
  • \(\varepsilon_{ij} \sim N(0, \sigma^2)\): Patient-level residual

Random slopes model

\[ \text{Anxiety}_{ij} = \beta_0 + \beta_1 \text{Therapy}_{ij} + u_j + v_j \text{Therapy}_{ij} + \varepsilon_{ij} \]

  • \(v_j \sim N(0, \tau_1^2)\): Clinic-specific deviation in therapy effect
  • Correlation between \(u_j\) and \(v_j\): Clinics with higher baseline anxiety may respond differently to therapy

4. Simulating a dataset

We simulate data for 300 patients from 15 clinics with varying baseline anxiety and therapy effects.

set.seed(2025)
library(lme4)
## Loading required package: Matrix
library(ggplot2)

# Simulation parameters
n_clinics <- 15
patients_per_clinic <- 20
n_total <- n_clinics * patients_per_clinic

# Create clinic IDs
clinic <- rep(1:n_clinics, each = patients_per_clinic)

# Random effects for clinics
clinic_intercepts <- rnorm(n_clinics, mean = 0, sd = 5)   # Baseline differences
clinic_slopes <- rnorm(n_clinics, mean = 0, sd = 0.3)     # Therapy effect differences

# Patient-level predictors
therapy_sessions <- round(runif(n_total, min = 0, max = 20))

# Generate anxiety scores
# Population: Anxiety = 50 - 1.5*Therapy + clinic effects + error
anxiety <- numeric(n_total)
for (i in 1:n_total) {
  clinic_id <- clinic[i]
  anxiety[i] <- 50 +                                    # Fixed intercept
                clinic_intercepts[clinic_id] +          # Random intercept
                (-1.5 + clinic_slopes[clinic_id]) * therapy_sessions[i] +  # Fixed + random slope
                rnorm(1, mean = 0, sd = 6)              # Residual error
}

# Ensure anxiety is in valid range
anxiety <- pmax(0, pmin(100, anxiety))

# Create data frame
data <- data.frame(
  patient_id = 1:n_total,
  clinic = factor(clinic),
  therapy_sessions = therapy_sessions,
  anxiety = anxiety
)

# View first rows
head(data, 12)
##    patient_id clinic therapy_sessions  anxiety
## 1           1      1                3 51.67108
## 2           2      1               15 35.04482
## 3           3      1               19 32.15488
## 4           4      1                4 50.58195
## 5           5      1                2 40.88176
## 6           6      1               15 23.30940
## 7           7      1               14 35.10353
## 8           8      1                7 44.77209
## 9           9      1                5 50.36949
## 10         10      1                4 45.18165
## 11         11      1                2 52.50908
## 12         12      1               20 25.98565
# Summary by clinic
aggregate(anxiety ~ clinic, data = data, FUN = function(x) c(mean = mean(x), sd = sd(x)))
##    clinic anxiety.mean anxiety.sd
## 1       1    40.189096   8.943037
## 2       2    33.780348  10.226558
## 3       3    37.268670   8.001161
## 4       4    37.558202  11.062779
## 5       5    43.788916   9.665642
## 6       6    33.820355   8.413072
## 7       7    37.930435  13.046259
## 8       8    41.485210   7.572337
## 9       9    38.804121  10.538637
## 10     10    41.712412  10.139023
## 11     11    28.873105  13.688091
## 12     12    21.913045  11.137794
## 13     13    35.477303  11.864484
## 14     14    36.877748  10.410038
## 15     15    40.776249  12.781180

This dataset has patients nested in clinics, with both random intercepts (baseline anxiety varies) and random slopes (therapy effectiveness varies).

5. Visualizing clustered data

Before fitting models, we visualize the clustering structure.

Anxiety by therapy across clinics

ggplot(data, aes(x = therapy_sessions, y = anxiety, color = clinic, group = clinic)) +
  geom_point(alpha = 0.6, size = 1.5) +
  geom_smooth(method = "lm", se = FALSE, size = 0.8) +
  labs(title = "Anxiety by Therapy Sessions Across Clinics",
       x = "Number of Therapy Sessions",
       y = "Anxiety Score",
       color = "Clinic") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "none"
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## `geom_smooth()` using formula = 'y ~ x'

Each clinic has its own regression line, showing heterogeneity in both intercepts and slopes.

Variation in clinic means

clinic_means <- aggregate(anxiety ~ clinic, data = data, FUN = mean)

ggplot(clinic_means, aes(x = clinic, y = anxiety)) +
  geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) +
  geom_hline(yintercept = mean(data$anxiety), linetype = "dashed", color = "red", size = 1) +
  labs(title = "Mean Anxiety by Clinic",
       x = "Clinic",
       y = "Mean Anxiety Score") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

The red dashed line shows the overall mean. Clinics vary substantially in average anxiety.

6. Fitting a standard regression (ignoring clustering)

First, we fit a standard regression that ignores clustering:

model_standard <- lm(anxiety ~ therapy_sessions, data = data)
summary(model_standard)
## 
## Call:
## lm(formula = anxiety ~ therapy_sessions, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.6977  -5.1920   0.6025   5.2024  24.6479 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      50.47281    0.91674   55.06   <2e-16 ***
## therapy_sessions -1.36616    0.07773  -17.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.215 on 298 degrees of freedom
## Multiple R-squared:  0.509,  Adjusted R-squared:  0.5074 
## F-statistic: 308.9 on 1 and 298 DF,  p-value: < 2.2e-16

This model assumes all patients are independent, which violates the clustered structure. Standard errors are likely too small.

7. Fitting a random intercept model

Now we fit a mixed effects model with random intercepts for clinics:

library(lme4)

model_ri <- lmer(anxiety ~ therapy_sessions + (1 | clinic), data = data)
summary(model_ri)
## Linear mixed model fit by REML ['lmerMod']
## Formula: anxiety ~ therapy_sessions + (1 | clinic)
##    Data: data
## 
## REML criterion at convergence: 1987.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.69379 -0.60026  0.03691  0.62603  2.85156 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  clinic   (Intercept) 31.08    5.575   
##  Residual             38.37    6.194   
## Number of obs: 300, groups:  clinic, 15
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)      50.94217    1.60141   31.81
## therapy_sessions -1.41266    0.05985  -23.61
## 
## Correlation of Fixed Effects:
##             (Intr)
## thrpy_sssns -0.377

The syntax (1 | clinic) specifies random intercepts for clinics. The model estimates: - Fixed effects: \(\beta_0\) (intercept) and \(\beta_1\) (therapy effect) - Random effects: \(\tau^2\) (variance of clinic intercepts) and \(\sigma^2\) (residual variance)

8. Interpreting the random intercept model

The random intercept model output includes:

Fixed effects: - Intercept: The estimated average anxiety score when therapy sessions = 0, across all clinics. - therapy_sessions: The estimated average decrease in anxiety per therapy session, holding clinic constant.

Random effects: - clinic (Intercept): Variance of clinic-specific intercepts (\(\tau^2\)). This quantifies how much baseline anxiety varies across clinics. - Residual: Within-clinic variance (\(\sigma^2\)). This quantifies variation among patients within the same clinic.

Intraclass Correlation (ICC):

We calculate the ICC to see what proportion of variance is due to clinic differences:

var_clinic <- as.data.frame(VarCorr(model_ri))$vcov[1]
var_residual <- as.data.frame(VarCorr(model_ri))$vcov[2]
ICC <- var_clinic / (var_clinic + var_residual)

cat("Between-clinic variance (tau^2):", round(var_clinic, 2), "\n")
## Between-clinic variance (tau^2): 31.08
cat("Within-clinic variance (sigma^2):", round(var_residual, 2), "\n")
## Within-clinic variance (sigma^2): 38.37
cat("ICC:", round(ICC, 3), "\n")
## ICC: 0.448

The output shows: - Between-clinic variance: \(\tau^2 = 31.08\) - Within-clinic variance: \(\sigma^2 = 38.37\)
- ICC = 0.448

An ICC of 0.448 means 44.8% of total variance in anxiety is due to differences between clinics.

Interpretation: The random intercept model accounts for baseline differences across clinics. Patients from the same clinic share more similar anxiety levels than patients from different clinics. The therapy effect is assumed to be the same across all clinics.

9. Fitting a random slopes model

Next, we allow the effect of therapy to vary across clinics by adding random slopes:

model_rs <- lmer(anxiety ~ therapy_sessions + (therapy_sessions | clinic), data = data)
summary(model_rs)
## Linear mixed model fit by REML ['lmerMod']
## Formula: anxiety ~ therapy_sessions + (therapy_sessions | clinic)
##    Data: data
## 
## REML criterion at convergence: 1973.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.82413 -0.64670  0.02686  0.62713  2.76382 
## 
## Random effects:
##  Groups   Name             Variance Std.Dev. Corr
##  clinic   (Intercept)      11.88699 3.4478       
##           therapy_sessions  0.07627 0.2762   0.55
##  Residual                  35.84954 5.9874       
## Number of obs: 300, groups:  clinic, 15
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)      51.01660    1.11946   45.57
## therapy_sessions -1.42970    0.09193  -15.55
## 
## Correlation of Fixed Effects:
##             (Intr)
## thrpy_sssns 0.012

The syntax (therapy_sessions | clinic) specifies both random intercepts and random slopes for therapy sessions.

This model estimates: - Variance of random intercepts - Variance of random slopes - Correlation between random intercepts and slopes

10. Interpreting the random slopes model

Fixed effects: Same as before—average intercept and average therapy effect.

Random effects: - clinic (Intercept): Variance in baseline anxiety across clinics. - clinic therapy_sessions: Variance in therapy effectiveness across clinics. - Correlation: If negative, clinics with higher baseline anxiety show stronger therapy effects. If positive, clinics with higher anxiety show weaker effects.

Comparing models:

We use a likelihood ratio test to compare the random intercept and random slopes models:

anova(model_ri, model_rs)
## refitting model(s) with ML (instead of REML)
## Data: data
## Models:
## model_ri: anxiety ~ therapy_sessions + (1 | clinic)
## model_rs: anxiety ~ therapy_sessions + (therapy_sessions | clinic)
##          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## model_ri    4 1994.2 2009.0 -993.08   1986.2                         
## model_rs    6 1984.2 2006.4 -986.08   1972.2 13.998  2  0.0009127 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If the random slopes model fits significantly better (\(p < .05\)), we conclude that therapy effectiveness varies meaningfully across clinics.

Extracting clinic-specific effects:

clinic_effects <- ranef(model_rs)$clinic
head(clinic_effects)
##   (Intercept) therapy_sessions
## 1   1.8554323      0.134844749
## 2  -1.5784596     -0.009734087
## 3   0.5502901      0.151180047
## 4   2.0191222     -0.004771978
## 5   4.0027522      0.317330832
## 6  -2.7781210     -0.020098889

Each clinic has its own intercept and slope deviations. A clinic with a negative slope deviation has a stronger therapy effect than average.

11. Visualizing random effects

We can visualize the clinic-specific intercepts and slopes:

Clinic-specific intercepts

clinic_effects$clinic_id <- rownames(clinic_effects)
names(clinic_effects) <- c("Intercept", "Slope", "clinic_id")

ggplot(clinic_effects, aes(x = reorder(clinic_id, Intercept), y = Intercept)) +
  geom_bar(stat = "identity", fill = "coral", alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Random Intercepts by Clinic",
       x = "Clinic",
       y = "Deviation from Average Baseline Anxiety") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

Clinics above zero have higher-than-average baseline anxiety; those below zero have lower baseline anxiety.

Clinic-specific slopes

ggplot(clinic_effects, aes(x = reorder(clinic_id, Slope), y = Slope)) +
  geom_bar(stat = "identity", fill = "darkgreen", alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Random Slopes by Clinic",
       x = "Clinic",
       y = "Deviation from Average Therapy Effect") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

Negative slopes indicate stronger-than-average therapy effects; positive slopes indicate weaker effects.

12. Checking assumptions

Mixed effects models assume: - Normality of random effects - Normality of residuals - Homoscedasticity of residuals

Normality of random effects

qqnorm(clinic_effects$Intercept, main = "Q-Q Plot: Random Intercepts")
qqline(clinic_effects$Intercept, col = "red")

Random intercepts should follow a normal distribution.

Residual diagnostics

data$residuals <- residuals(model_rs)
data$fitted <- fitted(model_rs)

ggplot(data, aes(x = fitted, y = residuals)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

Residuals should scatter randomly around zero with constant variance.

13. Reporting results in APA style

Random intercept model

A mixed effects model with random intercepts for clinics was fitted to examine the effect of therapy sessions on anxiety scores. The model included 300 patients nested within 15 clinics. Fixed effects revealed that therapy sessions significantly reduced anxiety, \(b = -1.41\), \(t = -23.61\), \(p < .001\). Each additional therapy session was associated with a decrease of 1.41 points in anxiety, on average. Random effects showed substantial between-clinic variance in baseline anxiety, \(\tau^2 = 31.08\), indicating that clinics differed in their average anxiety levels. The intraclass correlation coefficient (ICC) was 0.448, meaning 44.8% of total variance was attributable to clinic-level differences. This clustering necessitates mixed effects modeling to avoid underestimating standard errors.

Random slopes model

A random slopes model allowed both baseline anxiety and therapy effectiveness to vary across clinics. The model provided a significantly better fit than the random intercept model, \(\chi^2(2) = 14.00\), \(p < .001\), indicating that therapy effectiveness varied meaningfully across clinics. Fixed effects remained significant, with therapy reducing anxiety by an average of 1.43 points per session, \(t = -15.55\), \(p < .001\). Random effects showed variance in both intercepts (\(\tau_0^2 = 11.89\)) and slopes (\(\tau_1^2 = 0.076\)), with a correlation of \(r = 0.55\) between them. This positive correlation suggests that clinics with higher baseline anxiety also showed stronger therapy effects. Clinic-specific slope deviations ranged from -0.49 to +0.41, highlighting heterogeneity in treatment response. These findings suggest that therapy is effective on average but that some clinics achieve stronger reductions in anxiety than others, possibly due to differences in staff training, therapeutic environment, or patient characteristics.

14. Practical considerations

When to use mixed effects models

Use mixed effects models when: - Data are clustered: Observations are grouped (students in schools, patients in clinics, employees in companies). - Data are longitudinal: Repeated measurements from the same individuals. - Ignoring clustering is problematic: ICC > 0.05 suggests meaningful clustering.

Model selection

  • Start with a random intercept model to account for baseline clustering.
  • Add random slopes if you hypothesize that effects vary across groups.
  • Use likelihood ratio tests to compare nested models.
  • Prefer simpler models when random slopes do not improve fit significantly.

Sample size considerations

Mixed effects models require sufficient groups (clusters) to estimate random effects reliably. Aim for at least 10-15 groups. With fewer groups, random effect estimates may be unstable.

Computational issues

Complex random effects structures (e.g., random slopes with many predictors) can cause convergence problems. If models fail to converge: - Simplify the random effects structure. - Center or scale predictors. - Use different optimizers (e.g., bobyqa, nloptwrap).

Extensions

Mixed effects models can be extended to: - Generalized linear mixed models (GLMMs): For binary, count, or other non-normal outcomes (using glmer()). - Longitudinal models: Repeated measures over time with time-varying predictors. - Cross-classified models: Multiple clustering variables (e.g., students nested in both schools and neighborhoods). - Three-level models: More complex hierarchies (e.g., students in classrooms in schools).

15. Conclusion

Mixed effects models are essential for analyzing clustered and longitudinal data. They account for dependencies among observations by partitioning variance into multiple levels and estimating both population-average effects (fixed effects) and group-specific deviations (random effects).

By modeling clustering explicitly, mixed effects models: - Provide accurate standard errors and valid p-values - Reveal variation across groups (clinics, schools, individuals) - Allow group-specific predictions - Support multilevel research questions about both individuals and groups

Ignoring clustering leads to biased inferences, inflated Type I error rates, and missed opportunities to understand heterogeneity. Mixed effects models address these issues while maintaining the interpretability of regression.

In behavioral and health research, data are almost always clustered—patients within clinics, students within schools, repeated measures within individuals. Mixed effects models provide the proper framework for analyzing such data, respecting the hierarchical structure while estimating meaningful effects.

With mixed effects models, we can answer richer questions: Does therapy reduce anxiety on average? Yes. But does it work equally well in all clinics? No—some clinics show stronger effects than others, and understanding why is the next scientific frontier.

By embracing the complexity of real-world data structures, mixed effects models move psychological and health research closer to the nuanced, context-dependent understanding that best serves science and practice.