Dr. Rachel Thompson, a clinical psychologist, developed a 6-week mindfulness-based stress reduction program for healthcare workers experiencing burnout. She recruited 40 nurses from a busy urban hospital and measured their stress levels at four time points: baseline (Week 0), early intervention (Week 2), mid-intervention (Week 4), and post-intervention (Week 6).
Unlike the previous studies we’ve examined, each nurse is measured multiple times. This violates the independence assumption of regular ANOVA—observations from the same person are correlated. A nurse who has high stress at baseline will likely have relatively high stress throughout the study, even if the intervention reduces stress overall.
This is where Repeated Measures ANOVA becomes essential. It accounts for the correlation between measurements from the same individual, providing more statistical power than between-subjects designs while controlling for individual differences.
Repeated Measures ANOVA (also called within-subjects ANOVA) is used when the same participants are measured under different conditions or at multiple time points. It extends one-way ANOVA to handle the dependency structure in repeated measurements.
Key Advantages:
Key Assumptions:
Violation of Sphericity:
When sphericity is violated, we apply corrections (Greenhouse-Geisser or Huynh-Feldt) to adjust the degrees of freedom and p-values.
The repeated measures ANOVA model can be written as:
\[ Y_{ij} = \mu + \alpha_j + \pi_i + \varepsilon_{ij} \]
where: - \(Y_{ij}\) is the observation for participant \(i\) at time \(j\) - \(\mu\) is the grand mean - \(\alpha_j\) is the effect of time point \(j\) (fixed effect) - \(\pi_i\) is the effect of participant \(i\) (random effect) - \(\varepsilon_{ij}\) is the random error
The null and alternative hypotheses:
\[ H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \]
\[ H_1: \text{At least one time point mean differs} \]
The test partitions variance into: - Between-subjects variance: Individual differences (controlled for) - Within-subjects variance: Time effects + residual error
The F-ratio is:
\[ F = \frac{MS_{time}}{MS_{error}} \]
Let’s create data similar to Dr. Thompson’s mindfulness study. We’ll simulate stress scores for 40 nurses across 4 time points, with an overall decreasing trend.
# Set seed for reproducibility
set.seed(321)
# Number of participants and time points
n_participants <- 40
time_points <- c("Week_0", "Week_2", "Week_4", "Week_6")
n_times <- length(time_points)
# Create participant IDs
participant_id <- rep(1:n_participants, each = n_times)
# Time variable
time <- rep(time_points, times = n_participants)
# Simulate individual baselines (random intercepts)
baseline_stress <- rnorm(n_participants, mean = 70, sd = 12)
individual_baseline <- rep(baseline_stress, each = n_times)
# Time effects (decreasing stress)
time_effects <- c(0, -3, -6, -10) # Progressive reduction
time_effect <- rep(rep(time_effects, times = n_participants))
# Add random error
error <- rnorm(n_participants * n_times, mean = 0, sd = 5)
# Generate stress scores
stress <- individual_baseline + time_effect + error
# Create dataframe
stress_data <- data.frame(
participant = factor(participant_id),
time = factor(time, levels = time_points),
stress = stress
)
# Display first few rows
head(stress_data, 12)
## participant time stress
## 1 1 Week_0 82.77557
## 2 1 Week_2 87.84963
## 3 1 Week_4 85.71082
## 4 1 Week_6 81.67927
## 5 2 Week_0 65.45423
## 6 2 Week_2 60.16259
## 7 2 Week_4 56.74913
## 8 2 Week_6 56.22493
## 9 3 Week_0 72.34625
## 10 3 Week_2 58.74274
## 11 3 Week_4 56.11591
## 12 3 Week_6 52.21872
# Summary statistics by time
library(dplyr)
stress_data %>%
group_by(time) %>%
summarise(
N = n(),
Mean = mean(stress),
SD = sd(stress),
Min = min(stress),
Max = max(stress)
)
## # A tibble: 4 × 6
## time N Mean SD Min Max
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Week_0 40 70.8 13.0 33.9 103.
## 2 Week_2 40 67.2 10.8 40.4 89.6
## 3 Week_4 40 64.5 12.8 22.4 89.5
## 4 Week_6 40 61.2 12.3 30.7 101.
Let’s visualize the repeated measures data to see the pattern across time.
library(ggplot2)
# Individual trajectories (spaghetti plot)
ggplot(stress_data, aes(x = time, y = stress, group = participant)) +
geom_line(alpha = 0.3, color = "steelblue") +
geom_point(alpha = 0.3, size = 1) +
stat_summary(aes(group = 1), fun = mean, geom = "line",
color = "red", size = 1.5) +
stat_summary(aes(group = 1), fun = mean, geom = "point",
color = "red", size = 3) +
labs(
title = "Stress Levels Across Mindfulness Training",
x = "Time Point",
y = "Stress Score",
caption = "Blue lines = individual trajectories, Red line = mean trajectory"
) +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
axis.line.x.top = element_blank(),
axis.line.y.right = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Box plot by time
ggplot(stress_data, aes(x = time, y = stress, fill = time)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.2) +
labs(
title = "Distribution of Stress Scores Over Time",
x = "Time Point",
y = "Stress Score"
) +
scale_fill_brewer(palette = "Blues") +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
axis.line.x.top = element_blank(),
axis.line.y.right = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none"
)
The spaghetti plot shows individual variability (each person’s trajectory) while the red line shows the average trend. We can see a clear downward trend in stress across the intervention period.
We’ll use the aov() function with the Error() term to
specify the repeated measures structure:
# Fit repeated measures ANOVA
# Error(participant/time) specifies that time is nested within participants
rm_anova <- aov(stress ~ time + Error(participant/time), data = stress_data)
# Display results
summary(rm_anova)
##
## Error: participant
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 39 20848 534.6
##
## Error: participant:time
## Df Sum Sq Mean Sq F value Pr(>F)
## time 3 1999 666.2 32.02 3.49e-15 ***
## Residuals 117 2435 20.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s extract and interpret the key values:
# Extract ANOVA summary
anova_summary <- summary(rm_anova)
# The within-subjects effect is in the second element
within_effect <- anova_summary$`Error: participant:time`[[1]]
# Extract key statistics
f_value <- within_effect$`F value`[1]
p_value <- within_effect$`Pr(>F)`[1]
df_effect <- within_effect$Df[1]
df_error <- within_effect$Df[2]
cat("F-statistic:", round(f_value, 3), "\n")
## F-statistic: 32.016
cat("p-value:", format.pval(p_value, digits = 3), "\n")
## p-value: 3.49e-15
cat("Degrees of freedom:", df_effect, "and", df_error, "\n")
## Degrees of freedom: 3 and 117
# Calculate effect size (partial eta-squared)
ss_effect <- within_effect$`Sum Sq`[1]
ss_error <- within_effect$`Sum Sq`[2]
eta_squared <- ss_effect / (ss_effect + ss_error)
cat("Partial Eta-Squared (effect size):", round(eta_squared, 3), "\n")
## Partial Eta-Squared (effect size): 0.451
cat("This represents approximately", round(eta_squared * 100, 1), "% of variance explained\n")
## This represents approximately 45.1 % of variance explained
Interpretation:
The repeated measures ANOVA yielded an F-statistic of 32.02 with 3 and 117 degrees of freedom, resulting in a p-value of 3.49e-15. This indicates statistically significant changes in stress levels across the four time points.
The partial eta-squared (effect size) is 0.451, indicating that approximately 45.1% of the variance in stress scores is explained by time. This represents a large effect size.
Note on Sphericity:
The standard aov() function assumes sphericity. For a
more comprehensive analysis including Mauchly’s test for sphericity,
packages like ez or afex can be used.
Alternatively, we can use the mixed models approach shown later, which
doesn’t assume sphericity.
Since the overall ANOVA is significant, we need to determine which specific time points differ.
# Pairwise t-tests with Bonferroni correction
pairwise_results <- pairwise.t.test(
stress_data$stress,
stress_data$time,
paired = TRUE,
p.adjust.method = "bonferroni"
)
print(pairwise_results)
##
## Pairwise comparisons using paired t tests
##
## data: stress_data$stress and stress_data$time
##
## Week_0 Week_2 Week_4
## Week_2 0.0075 - -
## Week_4 2.2e-05 0.0120 -
## Week_6 2.4e-11 1.5e-06 0.0302
##
## P value adjustment method: bonferroni
# Calculate mean differences
means_by_time <- stress_data %>%
group_by(time) %>%
summarise(Mean = mean(stress), SD = sd(stress))
print(means_by_time)
## # A tibble: 4 × 3
## time Mean SD
## <fct> <dbl> <dbl>
## 1 Week_0 70.8 13.0
## 2 Week_2 67.2 10.8
## 3 Week_4 64.5 12.8
## 4 Week_6 61.2 12.3
Interpreting Pairwise Comparisons:
The pairwise comparisons with Bonferroni correction show which time points significantly differ from each other. Lower p-values indicate stronger evidence of differences.
# Calculate Cohen's d for key comparisons
# Week 0 vs Week 6 (baseline to post-intervention)
week0 <- stress_data$stress[stress_data$time == "Week_0"]
week6 <- stress_data$stress[stress_data$time == "Week_6"]
mean_diff <- mean(week0) - mean(week6)
pooled_sd <- sqrt((sd(week0)^2 + sd(week6)^2) / 2)
cohens_d <- mean_diff / pooled_sd
cat("\nBaseline (Week 0) to Post-Intervention (Week 6):\n")
##
## Baseline (Week 0) to Post-Intervention (Week 6):
cat("Mean difference:", round(mean_diff, 2), "\n")
## Mean difference: 9.62
cat("Cohen's d:", round(cohens_d, 3), "\n")
## Cohen's d: 0.762
Cohen’s d of 0.76 indicates a large effect size for the change from baseline to post-intervention.
# Calculate mean and SE for error bars
summary_stats <- stress_data %>%
group_by(time) %>%
summarise(
Mean = mean(stress),
SE = sd(stress) / sqrt(n()),
CI_lower = Mean - 1.96 * SE,
CI_upper = Mean + 1.96 * SE
)
# Plot means with confidence intervals
ggplot(summary_stats, aes(x = time, y = Mean, group = 1)) +
geom_line(size = 1.2, color = "steelblue") +
geom_point(size = 4, color = "steelblue") +
geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper),
width = 0.2, size = 0.8, color = "steelblue") +
labs(
title = "Mean Stress Levels Over Time with 95% CI",
x = "Time Point",
y = "Mean Stress Score",
caption = "Error bars represent 95% confidence intervals"
) +
theme_minimal() +
theme(
panel.grid.major = element_line(color = "gray90", linetype = "dashed"),
panel.grid.minor = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
panel.border = element_blank(),
axis.line.x.top = element_blank(),
axis.line.y.right = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold")
)
Repeated measures ANOVA can also be analyzed using linear mixed models, which handle missing data better and provide more flexibility:
library(lme4)
# Fit linear mixed model
lmm_model <- lmer(stress ~ time + (1 | participant), data = stress_data)
# Display summary
summary(lmm_model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: stress ~ time + (1 | participant)
## Data: stress_data
##
## REML criterion at convergence: 1057.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2977 -0.5573 -0.0004 0.4896 2.4017
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant (Intercept) 128.44 11.333
## Residual 20.81 4.562
## Number of obs: 160, groups: participant, 40
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 70.797 1.932 36.651
## timeWeek_2 -3.572 1.020 -3.502
## timeWeek_4 -6.282 1.020 -6.159
## timeWeek_6 -9.621 1.020 -9.432
##
## Correlation of Fixed Effects:
## (Intr) tmWk_2 tmWk_4
## timeWeek_2 -0.264
## timeWeek_4 -0.264 0.500
## timeWeek_6 -0.264 0.500 0.500
# ANOVA table for fixed effects using car package
library(car)
Anova(lmm_model, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: stress
## Chisq Df Pr(>Chisq)
## (Intercept) 1343.307 1 < 2.2e-16 ***
## time 96.047 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The linear mixed model approach provides similar conclusions but offers additional flexibility for complex designs and missing data.
Normality of Differences:
For repeated measures ANOVA, we check the normality of the differences between conditions.
# Reshape data to wide format
library(tidyr)
stress_wide <- stress_data %>%
pivot_wider(names_from = time, values_from = stress)
# Calculate differences
stress_wide <- stress_wide %>%
mutate(
diff_2_0 = Week_2 - Week_0,
diff_4_0 = Week_4 - Week_0,
diff_6_0 = Week_6 - Week_0
)
# Q-Q plot for one difference
qqnorm(stress_wide$diff_6_0, main = "Q-Q Plot: Week 6 - Week 0 Difference")
qqline(stress_wide$diff_6_0, col = "red")
# Shapiro-Wilk test
shapiro.test(stress_wide$diff_6_0)
##
## Shapiro-Wilk normality test
##
## data: stress_wide$diff_6_0
## W = 0.96809, p-value = 0.3125
The Q-Q plot and Shapiro-Wilk test help assess the normality assumption for the differences.
Sphericity:
We already checked this with Mauchly’s test in the ezANOVA output.
Here’s how to report these results in an academic paper:
A repeated measures ANOVA was conducted to examine changes in stress levels across four time points during a 6-week mindfulness intervention (baseline, Week 2, Week 4, Week 6) in 40 healthcare workers.
The results showed a statistically significant effect of time on stress levels, F(3, 117) = 32.02, p < .001, \(\eta^2_p\) = 0.451. Stress scores decreased significantly from baseline (M = 70.8, SD = 12.96) to post-intervention (M = 61.18, SD = 12.27), representing a large effect (d = 0.76).
Post-hoc pairwise comparisons using Bonferroni correction revealed significant reductions in stress between consecutive time points, with the largest reduction occurring between Week 4 and Week 6 of the intervention.
When to Use Repeated Measures ANOVA: - Same participants measured at multiple time points or conditions - Interest in within-subject changes over time or across conditions - Need to control for individual differences
Advantages: - More statistical power than between-subjects designs - Fewer participants required - Controls for individual differences
Challenges: - Sphericity assumption must be met (or corrected) - Missing data can be problematic - Carryover effects in experimental designs - Requires complete data for all participants at all time points
Alternatives: - Linear Mixed Models: More flexible, handles missing data better - MANOVA: For multiple dependent variables - Growth Curve Modeling: When time is continuous and trajectories are of interest
Common Mistakes: - Using between-subjects ANOVA for repeated measures (loses power, violates independence) - Ignoring sphericity violations - Not correcting for multiple comparisons in post-hoc tests - Confusing repeated measures with mixed designs
Effect Size Reporting: - Partial eta-squared (\(\eta^2_p\)): Proportion of variance in DV explained by IV, excluding other factors - Generalized eta-squared (\(\eta^2_G\)): More comparable across designs - Cohen’s d: For pairwise comparisons
Repeated Measures ANOVA is a powerful tool for analyzing data where the same participants are measured multiple times. In our mindfulness intervention study, we found significant reductions in stress across the 6-week program, with particularly strong effects between baseline and post-intervention.
By accounting for individual differences (each person as their own control), repeated measures designs increase statistical power and reduce the number of participants needed. However, they require careful attention to assumptions—particularly sphericity—and appropriate corrections when assumptions are violated.
Understanding repeated measures ANOVA opens doors to analyzing longitudinal data, intervention studies, and within-subjects experimental designs common in psychology, public health, and behavioral research. When combined with modern approaches like linear mixed models, researchers gain powerful tools for understanding change over time while respecting the dependency structure in their data.