Repeated Measures ANOVA in R

1. A Behavioral Story: The Mindfulness Training Study

Dr. Rachel Thompson, a clinical psychologist, developed a 6-week mindfulness-based stress reduction program for healthcare workers experiencing burnout. She recruited 40 nurses from a busy urban hospital and measured their stress levels at four time points: baseline (Week 0), early intervention (Week 2), mid-intervention (Week 4), and post-intervention (Week 6).

Unlike the previous studies we’ve examined, each nurse is measured multiple times. This violates the independence assumption of regular ANOVA—observations from the same person are correlated. A nurse who has high stress at baseline will likely have relatively high stress throughout the study, even if the intervention reduces stress overall.

This is where Repeated Measures ANOVA becomes essential. It accounts for the correlation between measurements from the same individual, providing more statistical power than between-subjects designs while controlling for individual differences.

2. What is Repeated Measures ANOVA?

Repeated Measures ANOVA (also called within-subjects ANOVA) is used when the same participants are measured under different conditions or at multiple time points. It extends one-way ANOVA to handle the dependency structure in repeated measurements.

Key Advantages:

Increased statistical power: By controlling for individual differences, we reduce error variance
Fewer participants needed: Each person serves as their own control
Accounts for correlation: Properly models the dependency between repeated observations

Key Assumptions:

Normality: The differences between conditions should be normally distributed
Sphericity: The variances of the differences between all pairs of conditions should be equal (similar to homogeneity of variance)
No missing data (or proper handling if present)

Violation of Sphericity:

When sphericity is violated, we apply corrections (Greenhouse-Geisser or Huynh-Feldt) to adjust the degrees of freedom and p-values.

3. The Repeated Measures ANOVA Model in Plain Form

The repeated measures ANOVA model can be written as:

\[ Y_{ij} = \mu + \alpha_j + \pi_i + \varepsilon_{ij} \]

where: - \(Y_{ij}\) is the observation for participant \(i\) at time \(j\) - \(\mu\) is the grand mean - \(\alpha_j\) is the effect of time point \(j\) (fixed effect) - \(\pi_i\) is the effect of participant \(i\) (random effect) - \(\varepsilon_{ij}\) is the random error

The null and alternative hypotheses:

\[ H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \]

\[ H_1: \text{At least one time point mean differs} \]

The test partitions variance into: - Between-subjects variance: Individual differences (controlled for) - Within-subjects variance: Time effects + residual error

The F-ratio is:

\[ F = \frac{MS_{time}}{MS_{error}} \]

4. Simulating a Dataset

Let’s create data similar to Dr. Thompson’s mindfulness study. We’ll simulate stress scores for 40 nurses across 4 time points, with an overall decreasing trend.

# Set seed for reproducibility
set.seed(321)

# Number of participants and time points
n_participants <- 40
time_points <- c("Week_0", "Week_2", "Week_4", "Week_6")
n_times <- length(time_points)

# Create participant IDs
participant_id <- rep(1:n_participants, each = n_times)

# Time variable
time <- rep(time_points, times = n_participants)

# Simulate individual baselines (random intercepts)
baseline_stress <- rnorm(n_participants, mean = 70, sd = 12)
individual_baseline <- rep(baseline_stress, each = n_times)

# Time effects (decreasing stress)
time_effects <- c(0, -3, -6, -10)  # Progressive reduction
time_effect <- rep(rep(time_effects, times = n_participants))

# Add random error
error <- rnorm(n_participants * n_times, mean = 0, sd = 5)

# Generate stress scores
stress <- individual_baseline + time_effect + error

# Create dataframe
stress_data <- data.frame(
  participant = factor(participant_id),
  time = factor(time, levels = time_points),
  stress = stress
)

# Display first few rows
head(stress_data, 12)

##    participant   time   stress
## 1            1 Week_0 82.77557
## 2            1 Week_2 87.84963
## 3            1 Week_4 85.71082
## 4            1 Week_6 81.67927
## 5            2 Week_0 65.45423
## 6            2 Week_2 60.16259
## 7            2 Week_4 56.74913
## 8            2 Week_6 56.22493
## 9            3 Week_0 72.34625
## 10           3 Week_2 58.74274
## 11           3 Week_4 56.11591
## 12           3 Week_6 52.21872

# Summary statistics by time
library(dplyr)
stress_data %>%
  group_by(time) %>%
  summarise(
    N = n(),
    Mean = mean(stress),
    SD = sd(stress),
    Min = min(stress),
    Max = max(stress)
  )

## # A tibble: 4 × 6
##   time       N  Mean    SD   Min   Max
##   <fct>  <int> <dbl> <dbl> <dbl> <dbl>
## 1 Week_0    40  70.8  13.0  33.9 103. 
## 2 Week_2    40  67.2  10.8  40.4  89.6
## 3 Week_4    40  64.5  12.8  22.4  89.5
## 4 Week_6    40  61.2  12.3  30.7 101.

5. Visualizing the Relationship

Let’s visualize the repeated measures data to see the pattern across time.

library(ggplot2)

# Individual trajectories (spaghetti plot)
ggplot(stress_data, aes(x = time, y = stress, group = participant)) +
  geom_line(alpha = 0.3, color = "steelblue") +
  geom_point(alpha = 0.3, size = 1) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", 
               color = "red", size = 1.5) +
  stat_summary(aes(group = 1), fun = mean, geom = "point", 
               color = "red", size = 3) +
  labs(
    title = "Stress Levels Across Mindfulness Training",
    x = "Time Point",
    y = "Stress Score",
    caption = "Blue lines = individual trajectories, Red line = mean trajectory"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    axis.line.x.top = element_blank(),
    axis.line.y.right = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# Box plot by time
ggplot(stress_data, aes(x = time, y = stress, fill = time)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.2) +
  labs(
    title = "Distribution of Stress Scores Over Time",
    x = "Time Point",
    y = "Stress Score"
  ) +
  scale_fill_brewer(palette = "Blues") +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    axis.line.x.top = element_blank(),
    axis.line.y.right = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "none"
  )

The spaghetti plot shows individual variability (each person’s trajectory) while the red line shows the average trend. We can see a clear downward trend in stress across the intervention period.

6. Fitting the Repeated Measures ANOVA in R

We’ll use the aov() function with the Error() term to specify the repeated measures structure:

# Fit repeated measures ANOVA
# Error(participant/time) specifies that time is nested within participants
rm_anova <- aov(stress ~ time + Error(participant/time), data = stress_data)

# Display results
summary(rm_anova)

## 
## Error: participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 39  20848   534.6               
## 
## Error: participant:time
##            Df Sum Sq Mean Sq F value   Pr(>F)    
## time        3   1999   666.2   32.02 3.49e-15 ***
## Residuals 117   2435    20.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7. Interpreting the Results

Let’s extract and interpret the key values:

# Extract ANOVA summary
anova_summary <- summary(rm_anova)

# The within-subjects effect is in the second element
within_effect <- anova_summary$`Error: participant:time`[[1]]

# Extract key statistics
f_value <- within_effect$`F value`[1]
p_value <- within_effect$`Pr(>F)`[1]
df_effect <- within_effect$Df[1]
df_error <- within_effect$Df[2]

cat("F-statistic:", round(f_value, 3), "\n")

## F-statistic: 32.016

cat("p-value:", format.pval(p_value, digits = 3), "\n")

## p-value: 3.49e-15

cat("Degrees of freedom:", df_effect, "and", df_error, "\n")

## Degrees of freedom: 3 and 117

# Calculate effect size (partial eta-squared)
ss_effect <- within_effect$`Sum Sq`[1]
ss_error <- within_effect$`Sum Sq`[2]
eta_squared <- ss_effect / (ss_effect + ss_error)

cat("Partial Eta-Squared (effect size):", round(eta_squared, 3), "\n")

## Partial Eta-Squared (effect size): 0.451

cat("This represents approximately", round(eta_squared * 100, 1), "% of variance explained\n")

## This represents approximately 45.1 % of variance explained

Interpretation:

The repeated measures ANOVA yielded an F-statistic of 32.02 with 3 and 117 degrees of freedom, resulting in a p-value of 3.49e-15. This indicates statistically significant changes in stress levels across the four time points.

The partial eta-squared (effect size) is 0.451, indicating that approximately 45.1% of the variance in stress scores is explained by time. This represents a large effect size.

Note on Sphericity:

The standard aov() function assumes sphericity. For a more comprehensive analysis including Mauchly’s test for sphericity, packages like ez or afex can be used. Alternatively, we can use the mixed models approach shown later, which doesn’t assume sphericity.

8. Post-hoc Pairwise Comparisons

Since the overall ANOVA is significant, we need to determine which specific time points differ.

# Pairwise t-tests with Bonferroni correction
pairwise_results <- pairwise.t.test(
  stress_data$stress, 
  stress_data$time, 
  paired = TRUE,
  p.adjust.method = "bonferroni"
)

print(pairwise_results)

## 
##  Pairwise comparisons using paired t tests 
## 
## data:  stress_data$stress and stress_data$time 
## 
##        Week_0  Week_2  Week_4
## Week_2 0.0075  -       -     
## Week_4 2.2e-05 0.0120  -     
## Week_6 2.4e-11 1.5e-06 0.0302
## 
## P value adjustment method: bonferroni

# Calculate mean differences
means_by_time <- stress_data %>%
  group_by(time) %>%
  summarise(Mean = mean(stress), SD = sd(stress))

print(means_by_time)

## # A tibble: 4 × 3
##   time    Mean    SD
##   <fct>  <dbl> <dbl>
## 1 Week_0  70.8  13.0
## 2 Week_2  67.2  10.8
## 3 Week_4  64.5  12.8
## 4 Week_6  61.2  12.3

Interpreting Pairwise Comparisons:

The pairwise comparisons with Bonferroni correction show which time points significantly differ from each other. Lower p-values indicate stronger evidence of differences.

# Calculate Cohen's d for key comparisons
# Week 0 vs Week 6 (baseline to post-intervention)
week0 <- stress_data$stress[stress_data$time == "Week_0"]
week6 <- stress_data$stress[stress_data$time == "Week_6"]

mean_diff <- mean(week0) - mean(week6)
pooled_sd <- sqrt((sd(week0)^2 + sd(week6)^2) / 2)
cohens_d <- mean_diff / pooled_sd

cat("\nBaseline (Week 0) to Post-Intervention (Week 6):\n")

## 
## Baseline (Week 0) to Post-Intervention (Week 6):

cat("Mean difference:", round(mean_diff, 2), "\n")

## Mean difference: 9.62

cat("Cohen's d:", round(cohens_d, 3), "\n")

## Cohen's d: 0.762

Cohen’s d of 0.76 indicates a large effect size for the change from baseline to post-intervention.

9. Visualizing the Fitted Model

# Calculate mean and SE for error bars
summary_stats <- stress_data %>%
  group_by(time) %>%
  summarise(
    Mean = mean(stress),
    SE = sd(stress) / sqrt(n()),
    CI_lower = Mean - 1.96 * SE,
    CI_upper = Mean + 1.96 * SE
  )

# Plot means with confidence intervals
ggplot(summary_stats, aes(x = time, y = Mean, group = 1)) +
  geom_line(size = 1.2, color = "steelblue") +
  geom_point(size = 4, color = "steelblue") +
  geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper), 
                width = 0.2, size = 0.8, color = "steelblue") +
  labs(
    title = "Mean Stress Levels Over Time with 95% CI",
    x = "Time Point",
    y = "Mean Stress Score",
    caption = "Error bars represent 95% confidence intervals"
  ) +
  theme_minimal() +
  theme(
    panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
    panel.grid.minor = element_blank(),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.border = element_blank(),
    axis.line.x.top = element_blank(),
    axis.line.y.right = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

10. Alternative Approach: Linear Mixed Models

Repeated measures ANOVA can also be analyzed using linear mixed models, which handle missing data better and provide more flexibility:

library(lme4)

# Fit linear mixed model
lmm_model <- lmer(stress ~ time + (1 | participant), data = stress_data)

# Display summary
summary(lmm_model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ time + (1 | participant)
##    Data: stress_data
## 
## REML criterion at convergence: 1057.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2977 -0.5573 -0.0004  0.4896  2.4017 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept) 128.44   11.333  
##  Residual                 20.81    4.562  
## Number of obs: 160, groups:  participant, 40
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   70.797      1.932  36.651
## timeWeek_2    -3.572      1.020  -3.502
## timeWeek_4    -6.282      1.020  -6.159
## timeWeek_6    -9.621      1.020  -9.432
## 
## Correlation of Fixed Effects:
##            (Intr) tmWk_2 tmWk_4
## timeWeek_2 -0.264              
## timeWeek_4 -0.264  0.500       
## timeWeek_6 -0.264  0.500  0.500

# ANOVA table for fixed effects using car package
library(car)
Anova(lmm_model, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: stress
##                Chisq Df Pr(>Chisq)    
## (Intercept) 1343.307  1  < 2.2e-16 ***
## time          96.047  3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The linear mixed model approach provides similar conclusions but offers additional flexibility for complex designs and missing data.

11. Checking Assumptions

Normality of Differences:

For repeated measures ANOVA, we check the normality of the differences between conditions.

# Reshape data to wide format
library(tidyr)
stress_wide <- stress_data %>%
  pivot_wider(names_from = time, values_from = stress)

# Calculate differences
stress_wide <- stress_wide %>%
  mutate(
    diff_2_0 = Week_2 - Week_0,
    diff_4_0 = Week_4 - Week_0,
    diff_6_0 = Week_6 - Week_0
  )

# Q-Q plot for one difference
qqnorm(stress_wide$diff_6_0, main = "Q-Q Plot: Week 6 - Week 0 Difference")
qqline(stress_wide$diff_6_0, col = "red")

# Shapiro-Wilk test
shapiro.test(stress_wide$diff_6_0)

## 
##  Shapiro-Wilk normality test
## 
## data:  stress_wide$diff_6_0
## W = 0.96809, p-value = 0.3125

The Q-Q plot and Shapiro-Wilk test help assess the normality assumption for the differences.

Sphericity:

We already checked this with Mauchly’s test in the ezANOVA output.

12. Reporting Results in APA Style

Here’s how to report these results in an academic paper:

A repeated measures ANOVA was conducted to examine changes in stress levels across four time points during a 6-week mindfulness intervention (baseline, Week 2, Week 4, Week 6) in 40 healthcare workers.

The results showed a statistically significant effect of time on stress levels, F(3, 117) = 32.02, p < .001, \(\eta^2_p\) = 0.451. Stress scores decreased significantly from baseline (M = 70.8, SD = 12.96) to post-intervention (M = 61.18, SD = 12.27), representing a large effect (d = 0.76).

Post-hoc pairwise comparisons using Bonferroni correction revealed significant reductions in stress between consecutive time points, with the largest reduction occurring between Week 4 and Week 6 of the intervention.

13. Practical Considerations

When to Use Repeated Measures ANOVA: - Same participants measured at multiple time points or conditions - Interest in within-subject changes over time or across conditions - Need to control for individual differences

Advantages: - More statistical power than between-subjects designs - Fewer participants required - Controls for individual differences

Challenges: - Sphericity assumption must be met (or corrected) - Missing data can be problematic - Carryover effects in experimental designs - Requires complete data for all participants at all time points

Alternatives: - Linear Mixed Models: More flexible, handles missing data better - MANOVA: For multiple dependent variables - Growth Curve Modeling: When time is continuous and trajectories are of interest

Common Mistakes: - Using between-subjects ANOVA for repeated measures (loses power, violates independence) - Ignoring sphericity violations - Not correcting for multiple comparisons in post-hoc tests - Confusing repeated measures with mixed designs

Effect Size Reporting: - Partial eta-squared (\(\eta^2_p\)): Proportion of variance in DV explained by IV, excluding other factors - Generalized eta-squared (\(\eta^2_G\)): More comparable across designs - Cohen’s d: For pairwise comparisons

14. Conclusion

Repeated Measures ANOVA is a powerful tool for analyzing data where the same participants are measured multiple times. In our mindfulness intervention study, we found significant reductions in stress across the 6-week program, with particularly strong effects between baseline and post-intervention.

By accounting for individual differences (each person as their own control), repeated measures designs increase statistical power and reduce the number of participants needed. However, they require careful attention to assumptions—particularly sphericity—and appropriate corrections when assumptions are violated.

Understanding repeated measures ANOVA opens doors to analyzing longitudinal data, intervention studies, and within-subjects experimental designs common in psychology, public health, and behavioral research. When combined with modern approaches like linear mixed models, researchers gain powerful tools for understanding change over time while respecting the dependency structure in their data.