Chapter 19: Bayesian Statistics

Chapter 19: Bayesian Statistics#

1. Introduction to Bayesian Statistics#

Bayesian statistics represents a fundamentally different approach to statistical inference compared to the frequentist methods we’ve explored in previous chapters. Rather than treating parameters as fixed but unknown quantities, Bayesian statistics views parameters as random variables with probability distributions.

The Bayesian approach is named after Thomas Bayes, whose theorem forms the mathematical foundation of this statistical paradigm. Bayesian methods allow us to:

Incorporate prior knowledge into our analyses
Update our beliefs as new data becomes available
Make probabilistic statements directly about parameters
Handle uncertainty in a more intuitive way In this chapter, we’ll explore the mathematical foundations of Bayesian statistics, implement Bayesian analyses in Python, and apply these methods to psychological research questions.

1.1 Bayes’ Theorem: The Foundation#

At the heart of Bayesian statistics is Bayes’ theorem, which describes how to update our beliefs about a hypothesis given new evidence:

\[P(H|D) = \frac{P(D|H) \cdot P(H)}{P(D)}\]

Where:

$P(H|D)$ is the posterior probability of hypothesis $H$ given data $D$
$P(D|H)$ is the likelihood of observing data $D$ under hypothesis $H$
$P(H)$ is the prior probability of hypothesis $H$
$P(D)$ is the marginal likelihood or evidence (the probability of observing data $D$ under all possible hypotheses) The denominator $P(D)$ can be calculated using the law of total probability:

\[P(D) = \sum_{i} P(D|H_i) \cdot P(H_i)\]

For continuous hypotheses, this becomes an integral:

\[P(D) = \int P(D|H) \cdot P(H) , dH\]

Let’s implement Bayes’ theorem for a simple example:

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from scipy import stats

def bayes_theorem(prior, likelihood, data):
    """
    Apply Bayes' theorem to update prior beliefs given new data.
    
    Parameters:
    -----------
    prior : array-like
        Prior probabilities for each hypothesis
    likelihood : function
        Function that takes hypothesis and data and returns likelihood
    data : array-like
        Observed data
        
    Returns:
    --------
    posterior : array-like
        Posterior probabilities for each hypothesis
    """
    # Calculate likelihood for each hypothesis
    likelihoods = np.array([likelihood(h, data) for h in range(len(prior))])
    
    # Calculate marginal likelihood
    marginal_likelihood = np.sum(likelihoods * prior)
    
    # Calculate posterior
    posterior = (likelihoods * prior) / marginal_likelihood
    
    return posterior

# Example: Testing for ESP
# Hypothesis: Person has ESP (can guess cards better than chance)
# Data: Number of correct guesses out of 25 trials

# Prior: We believe ESP is unlikely (10% chance)
prior = np.array([0.9, 0.1])  # [No ESP, Has ESP]

# Likelihood function: Binomial probability
def likelihood(hypothesis, data):
    n_trials = 25
    n_correct = data
    
    if hypothesis == 0:  # No ESP (p = 0.2, chance level)
        return stats.binom.pmf(n_correct, n_trials, 0.2)
    else:  # Has ESP (p = 0.5)
        return stats.binom.pmf(n_correct, n_trials, 0.5)

# Observed data: 10 correct guesses
data = 10

# Calculate posterior
posterior = bayes_theorem(prior, likelihood, data)

# Display results
hypotheses = ['No ESP', 'Has ESP']
results = pd.DataFrame({
    'Hypothesis': hypotheses,
    'Prior': prior,
    'Posterior': posterior
})

print(results)

# Plot
plt.figure(figsize=(10, 6))
plt.bar(hypotheses, prior, alpha=0.5, label='Prior')
plt.bar(hypotheses, posterior, alpha=0.5, label='Posterior')
plt.ylabel('Probability')
plt.title('Bayesian Update: ESP Testing')
plt.legend()
plt.show()

  Hypothesis  Prior  Posterior
0     No ESP    0.9    0.52108
1    Has ESP    0.1    0.47892

_images/695a13d7e68ee3ae5a63419b4f35574868b551a2f12d27faf15e29d6c26fd3d0.png

1.2 The Bayesian Framework in Psychology#

Bayesian statistics has gained popularity in psychology for several reasons:

Incorporating Prior Knowledge : Psychological theories often provide expectations about effect sizes or relationships that can be formalized as priors.
Intuitive Interpretation : Statements like “there is a 95% probability that the true effect is between X and Y” are direct interpretations of Bayesian credible intervals, unlike frequentist confidence intervals.
Handling Small Samples : Bayesian methods can provide more reasonable estimates with small samples, which are common in psychological research.
Accumulating Evidence : The sequential updating of beliefs aligns with how scientific knowledge accumulates over time.
Model Comparison : Bayesian methods offer principled ways to compare competing psychological theories. Let’s explore a psychological example: estimating the probability that a treatment for depression is effective.

# Example: Treatment effectiveness for depression
# We'll model the effectiveness as a binomial parameter (proportion of patients who improve)

def plot_beta_distribution(alpha, beta, label=None):
    """Plot a Beta distribution with given parameters."""
    x = np.linspace(0, 1, 1000)
    y = stats.beta.pdf(x, alpha, beta)
    plt.plot(x, y, label=label)
    return x, y

# Prior: Based on previous studies, we believe the treatment works for about 60% of patients
# with some uncertainty (equivalent to observing 12 successes out of 20 patients)
prior_alpha = 12
prior_beta = 8

# Data: In our new study, 15 out of 30 patients improved
data_success = 15
data_failure = 30 - 15

# Calculate posterior parameters (conjugate prior property of Beta-Binomial)
posterior_alpha = prior_alpha + data_success
posterior_beta = prior_beta + data_failure

# Plot prior and posterior
plt.figure(figsize=(12, 7))
x, prior_y = plot_beta_distribution(prior_alpha, prior_beta, label='Prior')
_, posterior_y = plot_beta_distribution(posterior_alpha, posterior_beta, label='Posterior')

# Add vertical lines for means
prior_mean = prior_alpha / (prior_alpha + prior_beta)
posterior_mean = posterior_alpha / (posterior_alpha + posterior_beta)

plt.axvline(prior_mean, color='blue', linestyle='--', alpha=0.5)
plt.axvline(posterior_mean, color='orange', linestyle='--', alpha=0.5)

# Calculate 95% credible interval for posterior
ci_low, ci_high = stats.beta.ppf([0.025, 0.975], posterior_alpha, posterior_beta)
plt.fill_between(x, 0, posterior_y, where=(x >= ci_low) & (x <= ci_high), 
                 color='orange', alpha=0.3, label='95% Credible Interval')

plt.xlabel('Effectiveness (proportion of patients who improve)')
plt.ylabel('Probability Density')
plt.title('Bayesian Analysis of Depression Treatment Effectiveness')
plt.legend()
plt.show()

# Print summary statistics
print(f"Prior mean: {prior_mean:.3f}")
print(f"Posterior mean: {posterior_mean:.3f}")
print(f"95% Credible Interval: [{ci_low:.3f}, {ci_high:.3f}]")
print(f"Probability that treatment is better than placebo (0.3): {1 - stats.beta.cdf(0.3, posterior_alpha, posterior_beta):.3f}")

_images/6578ad68cb8801b8197a50fd538d6c634d7a5cea99cf64cb63fe2b38457c2fdc.png

Prior mean: 0.600
Posterior mean: 0.540
95% Credible Interval: [0.402, 0.675]
Probability that treatment is better than placebo (0.3): 1.000

2. Bayesian Parameter Estimation#

One of the primary applications of Bayesian statistics is parameter estimation. Instead of point estimates, Bayesian methods provide entire probability distributions for parameters.

2.1 Conjugate Priors#

Conjugate priors are mathematically convenient because they result in posterior distributions from the same family as the prior. Some common conjugate relationships include:

Beta-Binomial : Beta prior for a binomial likelihood (proportion parameter)
Normal-Normal : Normal prior for a normal likelihood with known variance
Gamma-Poisson : Gamma prior for a Poisson likelihood (rate parameter) Let’s implement Bayesian estimation of a normal mean with a conjugate prior:

# Bayesian estimation of a normal mean with known variance
# Example: Estimating average reaction time in a psychological experiment

def normal_normal_update(prior_mean, prior_var, data, data_var):
    """
    Update a normal prior with normal data (known variance).
    
    Parameters:
    -----------
    prior_mean : float
        Mean of the prior distribution
    prior_var : float
        Variance of the prior distribution
    data : array-like
        Observed data points
    data_var : float
        Known variance of the data
        
    Returns:
    --------
    posterior_mean : float
        Mean of the posterior distribution
    posterior_var : float
        Variance of the posterior distribution
    """
    n = len(data)
    sample_mean = np.mean(data)
    
    # Calculate posterior parameters
    posterior_var = 1 / (1/prior_var + n/data_var)
    posterior_mean = posterior_var * (prior_mean/prior_var + n*sample_mean/data_var)
    
    return posterior_mean, posterior_var

# Prior: Based on previous studies, reaction times are around 300ms with standard deviation 50ms
prior_mean = 300
prior_var = 50**2

# Data: Observed reaction times from 20 participants (in milliseconds)
np.random.seed(42)
true_mean = 280  # True population mean (unknown in real scenario)
data_std = 60    # Known standard deviation of measurements
data = np.random.normal(true_mean, data_std, size=20)
data_var = data_std**2

# Calculate posterior
posterior_mean, posterior_var = normal_normal_update(prior_mean, prior_var, data, data_var)
posterior_std = np.sqrt(posterior_var)

# Plot prior, likelihood, and posterior
x = np.linspace(150, 450, 1000)
prior = stats.norm.pdf(x, prior_mean, np.sqrt(prior_var))
likelihood = stats.norm.pdf(x, np.mean(data), data_std/np.sqrt(len(data)))
posterior = stats.norm.pdf(x, posterior_mean, posterior_std)

plt.figure(figsize=(12, 7))
plt.plot(x, prior, label=f'Prior: N({prior_mean}, {np.sqrt(prior_var):.1f}²)')
plt.plot(x, likelihood, label=f'Likelihood: N({np.mean(data):.1f}, {data_std/np.sqrt(len(data)):.1f}²)')
plt.plot(x, posterior, label=f'Posterior: N({posterior_mean:.1f}, {posterior_std:.1f}²)')

# Add 95% credible interval
ci_low, ci_high = stats.norm.ppf([0.025, 0.975], posterior_mean, posterior_std)
plt.fill_between(x, 0, posterior, where=(x >= ci_low) & (x <= ci_high), 
                 color='green', alpha=0.3, label='95% Credible Interval')

plt.xlabel('Reaction Time (ms)')
plt.ylabel('Probability Density')
plt.title('Bayesian Estimation of Average Reaction Time')
plt.legend()
plt.grid(True)
plt.show()

# Print summary statistics
print(f"Prior mean: {prior_mean} ms, Prior SD: {np.sqrt(prior_var):.1f} ms")
print(f"Sample mean: {np.mean(data):.1f} ms, Sample SD: {np.std(data, ddof=1):.1f} ms")
print(f"Posterior mean: {posterior_mean:.1f} ms, Posterior SD: {posterior_std:.1f} ms")
print(f"95% Credible Interval: [{ci_low:.1f}, {ci_high:.1f}] ms")

_images/f14fcf5623d7937b4b6a1b6a33527abf622a68e3d16a0618eb3ce9d867d04782.png

Prior mean: 300 ms, Prior SD: 50.0 ms
Sample mean: 269.7 ms, Sample SD: 57.6 ms
Posterior mean: 271.8 ms, Posterior SD: 13.0 ms
95% Credible Interval: [246.4, 297.2] ms

2.2 Non-Conjugate Priors and Numerical Methods#

When conjugate priors aren’t available or appropriate, we need numerical methods to approximate the posterior distribution. Common approaches include:

Grid Approximation : Evaluating the posterior on a grid of parameter values
Markov Chain Monte Carlo (MCMC) : Generating samples from the posterior distribution
Variational Inference : Approximating the posterior with a simpler distribution Let’s implement grid approximation for a simple example:

def grid_approximation(prior, likelihood_func, data, grid_points):
    """
    Approximate posterior distribution using a grid of parameter values.
    
    Parameters:
    -----------
    prior : function
        Prior probability density function
    likelihood_func : function
        Likelihood function that takes parameter and data
    data : array-like
        Observed data
    grid_points : array-like
        Grid of parameter values to evaluate
        
    Returns:
    --------
    posterior : array-like
        Posterior probability density at each grid point
    """
    # Evaluate prior at each grid point
    prior_prob = prior(grid_points)
    
    # Evaluate likelihood at each grid point
    likelihood = np.array([likelihood_func(theta, data) for theta in grid_points])
    
    # Calculate unnormalized posterior
    posterior_unnorm = likelihood * prior_prob
    
    # Normalize posterior
    posterior = posterior_unnorm / np.sum(posterior_unnorm)
    
    return posterior * len(grid_points)  # Scale to get density

# Example: Estimating the probability of a biased coin
# We'll use a non-conjugate prior (a mixture of two beta distributions)

def mixture_beta_prior(theta):
    """A mixture of two beta distributions as a non-conjugate prior."""
    beta1 = stats.beta.pdf(theta, 2, 2)  # Centered around 0.5
    beta2 = stats.beta.pdf(theta, 10, 2)  # Centered around 0.83
    return 0.5 * beta1 + 0.5 * beta2

def binomial_likelihood(theta, data):
    """Binomial likelihood function."""
    n_trials, n_success = data
    return stats.binom.pmf(n_success, n_trials, theta)

# Data: 8 heads out of 10 coin flips
data = (10, 8)

# Create grid of parameter values
grid_points = np.linspace(0, 1, 1000)

# Calculate posterior using grid approximation
posterior = grid_approximation(mixture_beta_prior, binomial_likelihood, data, grid_points)

# Plot prior, likelihood, and posterior
plt.figure(figsize=(12, 7))
plt.plot(grid_points, mixture_beta_prior(grid_points), label='Prior (Beta Mixture)')
plt.plot(grid_points, [binomial_likelihood(theta, data) for theta in grid_points], label='Likelihood')
plt.plot(grid_points, posterior, label='Posterior')

# Find maximum a posteriori (MAP) estimate
map_idx = np.argmax(posterior)
map_estimate = grid_points[map_idx]
plt.axvline(map_estimate, color='red', linestyle='--', label=f'MAP Estimate: {map_estimate:.3f}')

# Calculate 95% credible interval
# Sort grid points by posterior density
sorted_idx = np.argsort(posterior)[::-1]
sorted_grid = grid_points[sorted_idx]
sorted_posterior = posterior[sorted_idx]
cumulative_prob = np.cumsum(sorted_posterior) / np.sum(sorted_posterior)
credible_idx = cumulative_prob <= 0.95
credible_region = sorted_grid[credible_idx]
plt.fill_between(grid_points, 0, posterior, where=(grid_points >= min(credible_region)) & 
                 (grid_points <= max(credible_region)), color='green', alpha=0.3, 
                 label='95% Credible Region')

plt.xlabel('θ (Probability of Heads)')
plt.ylabel('Probability Density')
plt.title('Bayesian Estimation of Coin Bias with Non-Conjugate Prior')
plt.legend()
plt.grid(True)
plt.show()

# Print summary statistics
print(f"MAP Estimate: {map_estimate:.3f}")
print(f"95% Credible Region: [{min(credible_region):.3f}, {max(credible_region):.3f}]")

_images/23328f0baff280be616dd13a94a0d2679575d7b1eee3182182294da3a4954014.png

MAP Estimate: 0.838
95% Credible Region: [0.575, 0.959]

3. Markov Chain Monte Carlo (MCMC)#

For more complex models, grid approximation becomes computationally infeasible. Markov Chain Monte Carlo (MCMC) methods allow us to sample from the posterior distribution without having to compute it explicitly.

3.1 Metropolis-Hastings Algorithm#

The Metropolis-Hastings algorithm is a classic MCMC method:

Start with an initial parameter value
Propose a new value from a proposal distribution
Accept or reject the proposal based on the posterior ratio
Repeat steps 2-3 many times Let’s implement the Metropolis-Hastings algorithm:

def metropolis_hastings(log_posterior, initial_value, n_samples, proposal_width):
    """
    Generate samples from a posterior distribution using the Metropolis-Hastings algorithm.
    
    Parameters:
    -----------
    log_posterior : function
        Function that calculates the log posterior density
    initial_value : float or array
        Starting point for the Markov chain
    n_samples : int
        Number of samples to generate
    proposal_width : float
        Standard deviation of the normal proposal distribution
        
    Returns:
    --------
    samples : array
        Generated samples from the posterior
    acceptance_rate : float
        Proportion of proposals that were accepted
    """
    # Initialize samples array with correct dimensions for multivariate case
    n_params = len(initial_value) if isinstance(initial_value, (list, np.ndarray)) else 1
    samples = np.zeros((n_samples, n_params)) if n_params > 1 else np.zeros(n_samples)
    
    # Set initial value
    if n_params > 1:
        samples[0,:] = initial_value
    else:
        samples[0] = initial_value
    
    # Counter for accepted proposals
    n_accepted = 0
    
    for i in range(1, n_samples):
        # Current value
        current = samples[i-1]
        
        # Propose new value(s)
        if n_params > 1:
            proposal = current + np.random.normal(0, proposal_width, size=n_params)
        else:
            proposal = current + np.random.normal(0, proposal_width)
        
        # Calculate log acceptance ratio
        try:
            log_accept_ratio = log_posterior(proposal) - log_posterior(current)
        except (ValueError, TypeError):
            # If proposal leads to invalid values, reject it
            log_accept_ratio = -np.inf
        
        # Accept or reject
        if np.log(np.random.random()) < log_accept_ratio:
            samples[i] = proposal
            n_accepted += 1
        else:
            samples[i] = current
    
    acceptance_rate = n_accepted / (n_samples - 1)
    return samples, acceptance_rate

def log_posterior(params, data):
    """
    Calculate log posterior for normal mean and precision (1/variance).
    
    Parameters:
    -----------
    params : array-like
        [mean, log_precision]
    data : array-like
        Observed data
        
    Returns:
    --------
    log_post : float
        Log posterior density
    """
    mean, log_precision = params
    precision = np.exp(log_precision)
    variance = 1 / precision
    
    # Prior on mean: Normal(0, 10²)
    log_prior_mean = stats.norm.logpdf(mean, 0, 10)
    
    # Prior on precision: Gamma(1, 1)
    log_prior_precision = stats.gamma.logpdf(precision, 1, scale=1) + log_precision  # Jacobian adjustment
    
    # Likelihood: Normal(mean, variance)
    log_likelihood = np.sum(stats.norm.logpdf(data, mean, np.sqrt(variance)))
    
    return log_prior_mean + log_prior_precision + log_likelihood

# Generate some data
np.random.seed(42)
true_mean = 5
true_std = 2
data = np.random.normal(true_mean, true_std, size=30)

# Define log posterior function for our specific data
log_post = lambda params: log_posterior(params, data)

# Run Metropolis-Hastings
initial_params = [0, 0]  # [mean, log_precision]
n_samples = 10000
proposal_width = 0.5

samples, acceptance_rate = metropolis_hastings(log_post, initial_params, n_samples, proposal_width)

# Discard burn-in period and convert log_precision to standard deviation
burn_in = 1000
mean_samples = samples[burn_in:, 0]
std_samples = np.sqrt(1 / np.exp(samples[burn_in:, 1]))

# Plot traces and histograms
plt.figure(figsize=(12, 10))

# Trace plots
plt.subplot(2, 2, 1)
plt.plot(mean_samples)
plt.title('Trace: Mean')
plt.xlabel('Iteration')
plt.ylabel('Value')

plt.subplot(2, 2, 2)
plt.plot(std_samples)
plt.title('Trace: Standard Deviation')
plt.xlabel('Iteration')
plt.ylabel('Value')

# Histogram plots
plt.subplot(2, 2, 3)
plt.hist(mean_samples, bins=30, alpha=0.7, density=True)
plt.axvline(true_mean, color='red', linestyle='--', label=f'True Value: {true_mean}')
plt.axvline(np.mean(mean_samples), color='green', linestyle='-', label=f'Posterior Mean: {np.mean(mean_samples):.2f}')
plt.title('Posterior: Mean')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()

plt.subplot(2, 2, 4)
plt.hist(std_samples, bins=30, alpha=0.7, density=True)
plt.axvline(true_std, color='red', linestyle='--', label=f'True Value: {true_std}')
plt.axvline(np.mean(std_samples), color='green', linestyle='-', label=f'Posterior Mean: {np.mean(std_samples):.2f}')
plt.title('Posterior: Standard Deviation')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Acceptance rate: {acceptance_rate:.2f}")
print(f"Posterior mean: {np.mean(mean_samples):.2f} (95% CI: [{np.percentile(mean_samples, 2.5):.2f}, {np.percentile(mean_samples, 97.5):.2f}])")
print(f"Posterior standard deviation: {np.mean(std_samples):.2f} (95% CI: [{np.percentile(std_samples, 2.5):.2f}, {np.percentile(std_samples, 97.5):.2f}])")

_images/d874ce2244e85cfd99f1c9cd81f8762aea136d38c2b52c07d6d705f4cc155e64.png

Acceptance rate: 0.35
Posterior mean: 4.63 (95% CI: [4.01, 5.23])
Posterior standard deviation: 1.79 (95% CI: [1.42, 2.27])

4. Bayesian Hypothesis Testing#

Bayesian hypothesis testing offers an alternative to traditional null hypothesis significance testing (NHST). Instead of p-values, Bayesian approaches use Bayes factors or posterior probabilities.

4.1 Bayes Factors#

The Bayes factor is the ratio of the marginal likelihoods under two competing hypotheses:

\[BF_{10} = \frac{P(D|H_1)}{P(D|H_0)}\]

Where:

$BF_{10}$ is the Bayes factor in favor of $H_1$ over $H_0$
$P(D|H_1)$ is the marginal likelihood under hypothesis $H_1$
$P(D|H_0)$ is the marginal likelihood under hypothesis $H_0$ Let’s implement Bayes factor calculation for a simple t-test:

def bayes_factor_t_test(x, y, scale=0.707):
    """
    Calculate Bayes factor for a two-sample t-test.
    
    Parameters:
    -----------
    x : array-like
        First sample
    y : array-like
        Second sample
    scale : float
        Scale parameter for the Cauchy prior on effect size
        
    Returns:
    --------
    bf10 : float
        Bayes factor in favor of the alternative hypothesis
    """
    from scipy.stats import ttest_ind
    
    # Calculate t-statistic and degrees of freedom
    t_stat, _ = ttest_ind(x, y, equal_var=True)
    df = len(x) + len(y) - 2
    
    # Calculate Bayes factor using BayesFactor package approximation
    bf10 = np.exp(t_stat**2/2) / (1 + t_stat**2/(df*scale**2))**(df/2)
    
    return bf10

# Example: Testing if a cognitive training program improves memory scores
np.random.seed(42)

# Control group
control_scores = np.random.normal(100, 15, size=30)

# Treatment group (with a true effect)
treatment_scores = np.random.normal(110, 15, size=30)

# Calculate Bayes factor
bf10 = bayes_factor_t_test(treatment_scores, control_scores)

# Interpret Bayes factor
def interpret_bayes_factor(bf):
    if bf > 100:
        return "Extreme evidence for H1"
    elif bf > 30:
        return "Very strong evidence for H1"
    elif bf > 10:
        return "Strong evidence for H1"
    elif bf > 3:
        return "Moderate evidence for H1"
    elif bf > 1:
        return "Anecdotal evidence for H1"
    elif bf > 1/3:
        return "Anecdotal evidence for H0"
    elif bf > 1/10:
        return "Moderate evidence for H0"
    elif bf > 1/30:
        return "Strong evidence for H0"
    elif bf > 1/100:
        return "Very strong evidence for H0"
    else:
        return "Extreme evidence for H0"

# Print results
print(f"Bayes Factor (BF10): {bf10:.2f}")
print(f"Interpretation: {interpret_bayes_factor(bf10)}")

# Compare with traditional t-test
from scipy.stats import ttest_ind
t_stat, p_value = ttest_ind(treatment_scores, control_scores)
print(f"t-statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.4f}")

# Plot data
plt.figure(figsize=(10, 6))
plt.boxplot([control_scores, treatment_scores], labels=['Control', 'Treatment'])
plt.ylabel('Memory Score')
plt.title('Memory Scores by Group')

# Add individual data points
for i, data in enumerate([control_scores, treatment_scores]):
    x = np.random.normal(i+1, 0.04, size=len(data))
    plt.scatter(x, data, alpha=0.4)

plt.annotate(f"BF10 = {bf10:.2f}\n{interpret_bayes_factor(bf10)}", 
             xy=(0.5, 0.9), xycoords='axes fraction', 
             bbox=dict(boxstyle="round,pad=0.3", fc="white", ec="gray", alpha=0.8))

plt.show()

Bayes Factor (BF10): 0.03
Interpretation: Very strong evidence for H0
t-statistic: 3.10
p-value: 0.0030

_images/b936f8d73c80ccc3608ae219e2c1fa3b0461045a90e728caf4e5cc7eebd9f5dc.png

4.2 Bayesian Estimation and Hypothesis Testing in Psychology#

Let’s apply Bayesian methods to a classic psychological research question: Does cognitive behavioral therapy (CBT) reduce depression symptoms?

def bayesian_t_test(x, y, prior_scale=0.707, n_samples=10000):
    """
    Perform a Bayesian t-test and return posterior samples of the effect size.
    
    Parameters:
    -----------
    x : array-like
        First sample (e.g., treatment group)
    y : array-like
        Second sample (e.g., control group)
    prior_scale : float
        Scale parameter for the Cauchy prior on effect size
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    effect_samples : array
        Posterior samples of the effect size (Cohen's d)
    bf10 : float
        Bayes factor in favor of the alternative hypothesis
    """
    # Calculate sample statistics
    nx, ny = len(x), len(y)
    mean_x, mean_y = np.mean(x), np.mean(y)
    var_x, var_y = np.var(x, ddof=1), np.var(y, ddof=1)
    
    # Pooled standard deviation
    pooled_sd = np.sqrt(((nx - 1) * var_x + (ny - 1) * var_y) / (nx + ny - 2))
    
    # Observed effect size (Cohen's d)
    observed_d = (mean_x - mean_y) / pooled_sd
    
    # Degrees of freedom
    df = nx + ny - 2
    
    # Standard error of d
    se_d = np.sqrt((nx + ny) / (nx * ny) + observed_d**2 / (2 * (nx + ny)))
    
    # Generate posterior samples using importance sampling
    # First, generate samples from a t-distribution (proposal distribution)
    proposal_samples = stats.t.rvs(df, loc=observed_d, scale=se_d, size=n_samples)
    
    # Calculate importance weights
    prior_density = stats.cauchy.pdf(proposal_samples, scale=prior_scale)
    proposal_density = stats.t.pdf(proposal_samples, df, loc=observed_d, scale=se_d)
    weights = prior_density / proposal_density
    
    # Normalize weights
    weights = weights / np.sum(weights)
    
    # Resample with replacement according to weights
    effect_samples = np.random.choice(proposal_samples, size=n_samples, p=weights)
    
    # Calculate Bayes factor using Savage-Dickey density ratio
    # Density at d=0 for posterior divided by density at d=0 for prior
    posterior_density_at_zero = stats.gaussian_kde(effect_samples)(0)[0]
    prior_density_at_zero = stats.cauchy.pdf(0, scale=prior_scale)
    bf01 = posterior_density_at_zero / prior_density_at_zero
    bf10 = 1 / bf01
    
    return effect_samples, bf10

Now let’s use this function with a realistic example from psychology:

# Simulated data: Effect of Cognitive Behavioral Therapy (CBT) on depression scores
# Lower scores indicate less depression
np.random.seed(42)

# Control group: Mean=20, SD=5
control_group = np.random.normal(20, 5, 30)

# Treatment group: Mean=15, SD=5 (expecting improvement)
treatment_group = np.random.normal(15, 5, 30)

# Run Bayesian t-test
effect_samples, bf10 = bayesian_t_test(treatment_group, control_group)

# Plot posterior distribution
plt.figure(figsize=(10, 6))
sns.histplot(effect_samples, kde=True, stat="density")
plt.axvline(x=0, color='red', linestyle='--')
plt.axvline(x=np.mean(effect_samples), color='blue', linestyle='-')

# Add 95% credible interval
ci_lower = np.percentile(effect_samples, 2.5)
ci_upper = np.percentile(effect_samples, 97.5)
plt.axvline(x=ci_lower, color='green', linestyle=':')
plt.axvline(x=ci_upper, color='green', linestyle=':')

plt.title(f'Posterior Distribution of Effect Size\nBF₁₀ = {bf10:.2f}')
plt.xlabel('Effect Size (Cohen\'s d)')
plt.ylabel('Density')

# Add annotations
plt.annotate(f'Mean = {np.mean(effect_samples):.2f}', 
             xy=(np.mean(effect_samples), 0.05),
             xytext=(np.mean(effect_samples) + 0.2, 0.2),
             arrowprops=dict(facecolor='blue', shrink=0.05))

plt.annotate(f'95% CI: [{ci_lower:.2f}, {ci_upper:.2f}]', 
             xy=((ci_lower + ci_upper)/2, 0.02),
             xytext=((ci_lower + ci_upper)/2, 0.1),
             ha='center',
             arrowprops=dict(facecolor='green', shrink=0.05))

plt.show()

# Print summary statistics
print(f"Effect size (Cohen's d): {np.mean(effect_samples):.2f}")
print(f"95% Credible Interval: [{ci_lower:.2f}, {ci_upper:.2f}]")
print(f"Probability of effect > 0: {np.mean(effect_samples > 0):.3f}")
print(f"Bayes Factor (BF₁₀): {bf10:.2f}")

_images/5880a99caf7a22c560381e6253bbed22214a94fb071ca97cd70abbd584aa4c93.png

Effect size (Cohen's d): -0.74
95% Credible Interval: [-2.27, 0.02]
Probability of effect > 0: 0.099
Bayes Factor (BF₁₀): 0.65

4.2 Bayesian Estimation#

Bayesian estimation provides a complete probabilistic framework for parameter estimation. Unlike frequentist approaches that yield point estimates and confidence intervals, Bayesian methods produce entire posterior distributions that represent our updated beliefs about parameters after observing data.

The Bayesian Approach to Estimation#

The Bayesian approach to estimation follows these steps:

Specify a prior distribution $p(\theta)$ that represents our beliefs about the parameter $\theta$ before seeing the data.
Define a likelihood function $p(D|\theta)$ that describes the probability of observing our data $D$ given different values of $\theta$.
Calculate the posterior distribution $p(\theta|D)$ using Bayes’ theorem: $$p(\theta|D) = \frac{p(D|\theta) \cdot p(\theta)}{p(D)}$$

Where $p(D)$ is the marginal likelihood (or evidence):

\[p(D) = \int p(D|\theta) \cdot p(\theta) , d\theta\]

Summarize the posterior distribution using measures like the mean, median, mode, and credible intervals.

Bayesian Estimation vs. Frequentist Estimation#

Bayesian estimation differs from frequentist approaches in several important ways:

Interpretation of probability : Bayesian statistics treats probability as a measure of belief, while frequentist statistics treats it as a long-run frequency.
Parameter uncertainty : Bayesian methods treat parameters as random variables with distributions, while frequentist methods treat them as fixed but unknown constants.
Prior information : Bayesian methods formally incorporate prior knowledge, while frequentist methods typically do not.
Inference : Bayesian methods provide direct probability statements about parameters (e.g., “There is a 95% probability that the effect size is between 0.2 and 0.8”), while frequentist confidence intervals have a more complex interpretation (e.g., “If we repeated this experiment many times, 95% of the resulting confidence intervals would contain the true effect size”).

Choosing Priors#

The choice of prior distribution is a crucial step in Bayesian analysis. There are several approaches:

Informative priors : Based on previous research, expert knowledge, or theoretical considerations.
Weakly informative priors : Provide some constraints but allow the data to dominate.
Non-informative priors : Attempt to have minimal influence on the posterior.
Conjugate priors : Mathematically convenient priors that result in posterior distributions of the same family. In psychological research, common prior choices include:

Normal distributions for means and effect sizes
Gamma or inverse-gamma distributions for variances
Beta distributions for proportions
Cauchy distributions for standardized effect sizes (as used in our Bayesian t-test)

Example: Estimating the Success Rate of a Therapy#

Let’s implement a Bayesian approach to estimate the success rate of a new therapy:

def bayesian_proportion_estimation(successes, trials, prior_alpha=1, prior_beta=1, n_samples=10000):
    """
    Perform Bayesian estimation of a proportion parameter.
    
    Parameters:
    -----------
    successes : int
        Number of successes
    trials : int
        Number of trials
    prior_alpha : float
        Alpha parameter for Beta prior
    prior_beta : float
        Beta parameter for Beta prior
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    proportion_samples : array
        Posterior samples of the proportion
    """
    # Calculate posterior parameters
    # For Beta-Binomial model: posterior is Beta(alpha + successes, beta + failures)
    post_alpha = prior_alpha + successes
    post_beta = prior_beta + (trials - successes)
    
    # Generate samples from posterior
    proportion_samples = np.random.beta(post_alpha, post_beta, n_samples)
    
    return proportion_samples

# Example: A new therapy shows success in 15 out of 20 patients
# We'll use a Beta(1, 1) prior (uniform distribution)
success_samples = bayesian_proportion_estimation(15, 20)

# Plot prior, likelihood, and posterior
plt.figure(figsize=(12, 6))

# Plot prior
x = np.linspace(0, 1, 1000)
prior = stats.beta.pdf(x, 1, 1)
plt.plot(x, prior, 'b--', label='Prior: Beta(1, 1)')

# Plot likelihood (not normalized)
likelihood = stats.binom.pmf(15, 20, x) * 10  # Scaled for visibility
plt.plot(x, likelihood, 'r-.', label='Likelihood (scaled)')

# Plot posterior
posterior = stats.beta.pdf(x, 1 + 15, 1 + 5)
plt.plot(x, posterior, 'g-', label='Posterior: Beta(16, 6)')

# Add posterior samples histogram
plt.hist(success_samples, bins=30, density=True, alpha=0.3, color='g')

# Calculate 95% credible interval
ci_lower = np.percentile(success_samples, 2.5)
ci_upper = np.percentile(success_samples, 97.5)

# Add vertical lines for credible interval
plt.axvline(x=ci_lower, color='g', linestyle=':')
plt.axvline(x=ci_upper, color='g', linestyle=':')

plt.title('Bayesian Estimation of Therapy Success Rate')
plt.xlabel('Success Rate (θ)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Print summary statistics
print(f"Estimated success rate: {np.mean(success_samples):.3f}")
print(f"95% Credible Interval: [{ci_lower:.3f}, {ci_upper:.3f}]")
print(f"Probability success rate > 0.5: {np.mean(success_samples > 0.5):.3f}")
print(f"Probability success rate > 0.7: {np.mean(success_samples > 0.7):.3f}")

_images/a60cabba89db010ce53e91282d4628d9c61168cd795cb80f7c25779647feced2.png

Estimated success rate: 0.727
95% Credible Interval: [0.525, 0.884]
Probability success rate > 0.5: 0.985
Probability success rate > 0.7: 0.640

Hierarchical Bayesian Models#

Hierarchical (or multilevel) Bayesian models are particularly valuable in psychology, where data often have a nested structure (e.g., participants nested within groups, or repeated measures within participants).

In a hierarchical model, parameters at one level are modeled as coming from a distribution determined by higher-level parameters. This approach:

Pools information across similar units, improving estimates for all units
Handles small sample sizes better than non-hierarchical approaches
Naturally models individual differences
Reduces overfitting by providing regularization Let’s implement a simple hierarchical model for estimating treatment effects across multiple studies:

import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

def hierarchical_effect_size_model(study_means, study_sems, n_samples=2000):
    """
    Fit a hierarchical Bayesian model to estimate the overall effect size
    across multiple studies.
    
    Parameters:
    -----------
    study_means : array
        Effect size estimates from each study
    study_sems : array
        Standard errors of the effect size estimates
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    trace : PyMC trace object
        Contains posterior samples for all parameters
    """
    n_studies = len(study_means)
    
    with pm.Model() as model:
        mu = pm.Normal('mu', mu=0, sigma=1)
        tau = pm.HalfCauchy('tau', beta=0.5)
        
        theta = pm.Normal('theta', mu=mu, sigma=tau, shape=n_studies)
        
        y = pm.Normal('y', mu=theta, sigma=study_sems, observed=study_means)
        
        idata = pm.sample(n_samples, tune=1000)
        
        # Convert inference data to pandas DataFrame
        trace = {
            'mu': idata.posterior.mu.values.flatten(),
            'tau': idata.posterior.tau.values.flatten()
        }
        for i in range(n_studies):
            trace[f'theta__{i}'] = idata.posterior.theta[:, :, i].values.flatten()
        
        trace = pd.DataFrame(trace)
    
    return trace

studies_data = np.array([
    [0.45, 0.15],
    [0.62, 0.12],
    [0.30, 0.20],
    [0.58, 0.14],
    [0.40, 0.10],
])

study_means = studies_data[:, 0]
study_sems = studies_data[:, 1]

trace = hierarchical_effect_size_model(study_means, study_sems)

plt.figure(figsize=(12, 8))

# Plot study estimates with error bars
for i in range(len(study_means)):
    plt.errorbar(x=study_means[i], y=i, xerr=1.96*study_sems[i], 
                 fmt='o', color='blue', label='Study estimate' if i==0 else "")

# Plot posterior distributions for each study
for i in range(len(study_means)):
    theta_col = f'theta__{i}'
    sns.kdeplot(x=trace[theta_col], y=np.ones(len(trace[theta_col]))*i, 
                fill=True, alpha=0.3, color='blue', 
                label='Posterior distribution' if i==0 else "")

# Plot overall effect distribution
sns.kdeplot(x=trace['mu'], y=np.ones(len(trace['mu']))*(len(study_means) + 1), 
            fill=True, alpha=0.5, color='red', label='Overall effect')

overall_mean = trace['mu'].mean()
overall_ci = np.percentile(trace['mu'], [2.5, 97.5])

plt.axvline(x=overall_mean, color='red', linestyle='-')
plt.axvline(x=overall_ci[0], color='red', linestyle=':')
plt.axvline(x=overall_ci[1], color='red', linestyle=':')

plt.yticks(list(range(len(study_means))) + [len(study_means) + 1], 
           [f'Study {i+1}' for i in range(len(study_means))] + ['Overall'])
plt.xlabel('Effect Size')
plt.title('Hierarchical Bayesian Meta-Analysis of CBT for Depression')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Overall effect size: {overall_mean:.2f}")
print(f"95% Credible Interval: [{overall_ci[0]:.2f}, {overall_ci[1]:.2f}]")
print(f"Probability effect > 0: {np.mean(trace['mu'] > 0):.3f}")
print(f"Between-study standard deviation: {trace['tau'].mean():.2f}")

Initializing NUTS using jitter+adapt_diag...

Multiprocess sampling (4 chains in 4 jobs)

NUTS: [mu, tau, theta]

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 18 seconds.

There were 461 divergences after tuning. Increase `target_accept` or reparameterize.

The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details

The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details

_images/9eecb1f15a53bb6a2bb31e5b20b69cb587de1765fa1333425e8f5af347591d7e.png

Overall effect size: 0.48
95% Credible Interval: [0.29, 0.66]
Probability effect > 0: 1.000
Between-study standard deviation: 0.12

Bayesian Regression Models#

Bayesian regression extends the principles of Bayesian estimation to regression models. This approach provides full posterior distributions for all regression coefficients, allowing for more nuanced inference.

Let’s implement a Bayesian linear regression model to predict therapy outcomes:

def bayesian_linear_regression(X, y, n_samples=2000):
    """
    Fit a Bayesian linear regression model.
    
    Parameters:
    -----------
    X : array
        Predictor variables (design matrix)
    y : array
        Outcome variable
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    trace : PyMC3 trace object
        Contains posterior samples for all parameters
    """
    n_predictors = X.shape[1]
    
    with pm.Model() as model:
        # Priors for regression coefficients
        intercept = pm.Normal('intercept', mu=0, sigma=10)
        betas = pm.Normal('betas', mu=0, sigma=1, shape=n_predictors)
        
        # Prior for error term
        sigma = pm.HalfCauchy('sigma', beta=5)
        
        # Expected value of outcome
        mu = intercept + pm.math.dot(X, betas)
        
        # Likelihood (sampling distribution) of observations
        y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
        
        # Sample from the posterior
        trace = pm.sample(n_samples, tune=1000, return_inferencedata=False)
    
    return trace

# Example: Predicting therapy outcomes from baseline measures
np.random.seed(42)
n_patients = 50

# Generate synthetic data
# Predictors: baseline severity, age, motivation
X = np.column_stack([
    np.random.normal(20, 5, n_patients),  # Baseline severity
    np.random.normal(35, 10, n_patients),  # Age
    np.random.normal(7, 2, n_patients)     # Motivation (1-10 scale)
])

# Standardize predictors
X_std = (X - np.mean(X, axis=0)) / np.std(X, axis=0)

# True coefficients
true_intercept = 15
true_betas = np.array([-2.5, 0.5, -1.5])  # Severity (-), Age (+), Motivation (-)

# Generate outcomes (post-therapy depression scores)
y = true_intercept + np.dot(X_std, true_betas) + np.random.normal(0, 3, n_patients)

# Fit Bayesian regression model
trace = bayesian_linear_regression(X_std, y)

# Plot posterior distributions of coefficients
plt.figure(figsize=(12, 8))

# Plot intercept
plt.subplot(2, 2, 1)
sns.histplot(trace['intercept'], kde=True)
plt.axvline(x=true_intercept, color='red', linestyle='--')
plt.title('Intercept')

# Plot coefficient for baseline severity
plt.subplot(2, 2, 2)
sns.histplot(trace['betas'][:, 0], kde=True)
plt.axvline(x=true_betas[0], color='red', linestyle='--')
plt.title('Baseline Severity Coefficient')

# Plot coefficient for age
plt.subplot(2, 2, 3)
sns.histplot(trace['betas'][:, 1], kde=True)
plt.axvline(x=true_betas[1], color='red', linestyle='--')
plt.title('Age Coefficient')

# Plot coefficient for motivation
plt.subplot(2, 2, 4)
sns.histplot(trace['betas'][:, 2], kde=True)
plt.axvline(x=true_betas[2], color='red', linestyle='--')
plt.title('Motivation Coefficient')

plt.tight_layout()
plt.show()

# Print summary statistics
print("Posterior means and 95% credible intervals:")
print(f"Intercept: {np.mean(trace['intercept']):.2f} [{np.percentile(trace['intercept'], 2.5):.2f}, {np.percentile(trace['intercept'], 97.5):.2f}]")

predictor_names = ['Baseline Severity', 'Age', 'Motivation']
for i in range(3):
    mean = np.mean(trace['betas'][:, i])
    ci_lower = np.percentile(trace['betas'][:, i], 2.5)
    ci_upper = np.percentile(trace['betas'][:, i], 97.5)
    prob_neg = np.mean(trace['betas'][:, i] < 0)
    prob_pos = np.mean(trace['betas'][:, i] > 0)
    
    print(f"{predictor_names[i]}: {mean:.2f} [{ci_lower:.2f}, {ci_upper:.2f}]")
    print(f"  Probability of negative effect: {prob_neg:.3f}")
    print(f"  Probability of positive effect: {prob_pos:.3f}")

Initializing NUTS using jitter+adapt_diag...

Multiprocess sampling (4 chains in 4 jobs)

NUTS: [intercept, betas, sigma]

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 17 seconds.

_images/9952736aa107071809d41c7e348605e62af52d0b45d174b559ad15a130d077f9.png

Posterior means and 95% credible intervals:
Intercept: 15.23 [14.47, 15.99]
Baseline Severity: -1.84 [-2.56, -1.10]
  Probability of negative effect: 1.000
  Probability of positive effect: 0.000
Age: 0.03 [-0.71, 0.76]
  Probability of negative effect: 0.466
  Probability of positive effect: 0.534
Motivation: -1.20 [-1.95, -0.47]
  Probability of negative effect: 0.999
  Probability of positive effect: 0.001

Advantages of Bayesian Estimation in Psychology#

Bayesian estimation offers several advantages for psychological research:

Intuitive interpretation : Credible intervals have the straightforward interpretation that most researchers incorrectly attribute to confidence intervals.
Small sample inference : Bayesian methods can provide valid inference even with small samples, which are common in psychology.
Incorporation of prior knowledge : Psychology has accumulated substantial knowledge that can inform priors.
No need for corrections for multiple comparisons : The Bayesian framework naturally adjusts for the increased uncertainty when making multiple comparisons.
Handling complex models : Bayesian methods can handle complex models with many parameters, including hierarchical and nonlinear models.
Missing data : Bayesian approaches provide principled ways to handle missing data through joint modeling.
Uncertainty quantification : Bayesian methods provide a complete picture of uncertainty in parameter estimates.

4.3 Bayesian Hypothesis Testing#

While we’ve already introduced Bayesian hypothesis testing with the Bayes factor in our t-test example, let’s explore this topic more thoroughly.

Bayesian hypothesis testing provides an alternative to traditional null hypothesis significance testing (NHST). Instead of calculating p-values, Bayesian approaches quantify the relative evidence for competing hypotheses.

The Bayes Factor#

The Bayes factor (BF) is the primary tool for Bayesian hypothesis testing. It represents the ratio of the marginal likelihoods under two competing hypotheses:

\[\text{BF}_{10} = \frac{p(D|H_1)}{p(D|H_0)}\]

Where:

$p(D|H_1)$ is the marginal likelihood of the data under the alternative hypothesis
$p(D|H_0)$ is the marginal likelihood of the data under the null hypothesis The Bayes factor can be interpreted as follows:
$\text{BF}_{10} > 1$: Evidence favors $H_1$
$\text{BF}_{10} < 1$: Evidence favors $H_0$
$\text{BF}_{10} = 1$: Evidence is inconclusive Common interpretations of Bayes factor strength:
1-3: Anecdotal evidence
3-10: Moderate evidence
10-30: Strong evidence
30-100: Very strong evidence
100: Extreme evidence Let’s implement a function to calculate Bayes factors for correlation tests:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

def bayesian_correlation_test(x, y, prior_width=1, n_samples=10000):
    """
    Perform a Bayesian test for correlation between two variables.
    
    Parameters:
    -----------
    x : array-like
        First variable
    y : array-like
        Second variable
    prior_width : float
        Width parameter for the prior distribution on correlation
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    rho_samples : array
        Posterior samples of the correlation coefficient
    bf10 : float
        Bayes factor in favor of the alternative hypothesis
    """
    # Calculate sample correlation
    r = np.corrcoef(x, y)[0, 1]
    n = len(x)
    
    # Use Fisher's z-transformation for more numerical stability
    z = np.arctanh(r) if abs(r) < 1 else np.sign(r) * 10  # Handle r = ±1
    sigma = 1.0 / np.sqrt(n - 3)
    
    # Generate posterior samples directly using z-transformation
    z_samples = np.random.normal(z, sigma, n_samples)
    rho_samples = np.tanh(z_samples)
    
    # Define prior distribution function
    def prior(rho, width=prior_width):
        """Stretched beta prior on correlation coefficient."""
        if abs(rho) > 1:
            return 0
        return ((1 - rho**2) ** 0.5) / (2 * width) * (abs(rho) < width) + \
               ((1 - rho**2) ** 0.5) / (2 * (1 - width)) * (abs(rho) >= width)
    
    # Calculate Bayes factor using Monte Carlo integration
    prior_samples = np.random.uniform(-0.99, 0.99, n_samples)
    prior_densities = np.array([prior(rho) for rho in prior_samples])
    
    # Calculate marginal likelihood ratio (BF10)
    # Approximate the densities around zero
    zero_region = (rho_samples > -0.05) & (rho_samples < 0.05)
    prior_zero_region = (prior_samples > -0.05) & (prior_samples < 0.05)
    
    if np.sum(zero_region) > 0 and np.sum(prior_zero_region) > 0:
        posterior_density_at_zero = np.mean(zero_region) / 0.1  # Normalize by region width (0.1)
        prior_density_at_zero = np.mean(prior_zero_region) / 0.1
        bf01 = posterior_density_at_zero / prior_density_at_zero
        bf10 = 1 / bf01
    else:
        # Default to a simple calculation based on data
        bf10 = np.exp(n/2 * np.log(1 + r**2/(1-r**2)))
    
    return rho_samples, bf10

# Example: Testing correlation between therapy attendance and outcome improvement
np.random.seed(42)
n_patients = 40

# Generate correlated data
true_rho = 0.4  # Moderate positive correlation
cov_matrix = np.array([[1, true_rho], [true_rho, 1]])
data = np.random.multivariate_normal([0, 0], cov_matrix, n_patients)

attendance = data[:, 0]  # Number of therapy sessions attended
improvement = data[:, 1]  # Improvement in symptoms

# Run Bayesian correlation test
rho_samples, bf10 = bayesian_correlation_test(attendance, improvement)

# Plot results
plt.figure(figsize=(12, 6))

# Plot scatterplot of data
plt.subplot(1, 2, 1)
plt.scatter(attendance, improvement)
plt.xlabel('Therapy Attendance')
plt.ylabel('Symptom Improvement')
plt.title('Scatterplot of Data')
plt.grid(True, alpha=0.3)

# Add regression line
slope, intercept = np.polyfit(attendance, improvement, 1)
x_line = np.array([min(attendance), max(attendance)])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--')

# Plot posterior distribution
plt.subplot(1, 2, 2)
sns.histplot(rho_samples, kde=True)
plt.axvline(x=0, color='red', linestyle='--')
plt.axvline(x=np.mean(rho_samples), color='blue', linestyle='-')

# Add 95% credible interval
ci_lower = np.percentile(rho_samples, 2.5)
ci_upper = np.percentile(rho_samples, 97.5)
plt.axvline(x=ci_lower, color='green', linestyle=':')
plt.axvline(x=ci_upper, color='green', linestyle=':')

plt.title(f'Posterior Distribution of Correlation\nBF₁₀ = {bf10:.2f}')
plt.xlabel('Correlation Coefficient (ρ)')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Estimated correlation: {np.mean(rho_samples):.3f}")
print(f"95% Credible Interval: [{ci_lower:.3f}, {ci_upper:.3f}]")
print(f"Probability correlation > 0: {np.mean(rho_samples > 0):.3f}")
print(f"Bayes Factor (BF₁₀): {bf10:.2f}")

_images/31359e90c56b9b8008d0ed6cea2e6ee2fe28b3203009af68e2305fd457677343.png

Estimated correlation: 0.217
95% Credible Interval: [-0.096, 0.502]
Probability correlation > 0: 0.915
Bayes Factor (BF₁₀): 0.55

Bayesian Model Comparison#

Beyond testing simple hypotheses, Bayesian methods excel at comparing more complex models. This is done by comparing their marginal likelihoods or using information criteria.

Let’s implement a function to compare different regression models:

def bayesian_model_comparison(X, y, model_specs, n_samples=2000):
    """
    Compare different Bayesian regression models.
    
    Parameters:
    -----------
    X : array
        Full design matrix with all predictors
    y : array
        Outcome variable
    model_specs : list of lists
        Each inner list contains the indices of predictors to include in a model
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    model_comparison : dict
        Dictionary with model comparison metrics
    """
    n_models = len(model_specs)
    n_data = len(y)
    
    # Initialize results
    model_names = [f"Model {i+1}" for i in range(n_models)]
    waic_values = np.zeros(n_models)
    loo_values = np.zeros(n_models)
    traces = []
    
    # Fit each model
    for i, predictors in enumerate(model_specs):
        # Select predictors for this model
        if len(predictors) > 0:
            X_model = X[:, predictors]
        else:
            # Intercept-only model
            X_model = np.ones((n_data, 1))
        
        # Fit model
        with pm.Model() as model:
            # Priors
            intercept = pm.Normal('intercept', mu=0, sigma=10)
            
            if X_model.shape[1] > 1:  # If not intercept-only
                betas = pm.Normal('betas', mu=0, sigma=1, shape=X_model.shape[1] - 1)
                mu = intercept + pm.math.dot(X_model[:, 1:], betas)
            else:
                mu = intercept
            
            sigma = pm.HalfCauchy('sigma', beta=5)
            
                        # Likelihood
            y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
            
            # Sample from posterior
            trace = pm.sample(n_samples, return_inferencedata=True)
            traces.append(trace)
            
            # Calculate WAIC and LOO
            waic = pm.waic(trace)
            loo = pm.loo(trace)
            
            waic_values[i] = waic.waic
            loo_values[i] = loo.loo
    
    # Calculate model weights
    waic_weights = np.exp(-0.5 * (waic_values - np.min(waic_values)))
    waic_weights = waic_weights / np.sum(waic_weights)
    
    loo_weights = np.exp(-0.5 * (loo_values - np.min(loo_values)))
    loo_weights = loo_weights / np.sum(loo_weights)
    
    # Compile results
    model_comparison = {
        'model_names': model_names,
        'waic': waic_values,
        'waic_weights': waic_weights,
        'loo': loo_values,
        'loo_weights': loo_weights,
        'traces': traces
    }
    
    return model_comparison

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

def bayesian_correlation_test(x, y, prior_width=1, n_samples=10000):
    """
    Perform a Bayesian test for correlation between two variables.
    
    Parameters:
    -----------
    x : array-like
        First variable
    y : array-like
        Second variable
    prior_width : float
        Width parameter for the prior distribution on correlation
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    rho_samples : array
        Posterior samples of the correlation coefficient
    bf10 : float
        Bayes factor in favor of the alternative hypothesis
    """
    # Calculate sample correlation
    r = np.corrcoef(x, y)[0, 1]
    n = len(x)
    
    # Use Fisher's z-transformation for more numerical stability
    z = np.arctanh(r) if abs(r) < 1 else np.sign(r) * 10  # Handle r = ±1
    sigma = 1.0 / np.sqrt(n - 3)
    
    # Generate posterior samples directly using z-transformation
    z_samples = np.random.normal(z, sigma, n_samples)
    rho_samples = np.tanh(z_samples)
    
    # Define prior distribution function
    def prior(rho, width=prior_width):
        """Stretched beta prior on correlation coefficient."""
        if abs(rho) > 1:
            return 0
        return ((1 - rho**2) ** 0.5) / (2 * width) * (abs(rho) < width) + \
               ((1 - rho**2) ** 0.5) / (2 * (1 - width)) * (abs(rho) >= width)
    
    # Calculate Bayes factor using Monte Carlo integration
    prior_samples = np.random.uniform(-0.99, 0.99, n_samples)
    prior_densities = np.array([prior(rho) for rho in prior_samples])
    
    # Calculate marginal likelihood ratio (BF10)
    # Approximate the densities around zero
    zero_region = (rho_samples > -0.05) & (rho_samples < 0.05)
    prior_zero_region = (prior_samples > -0.05) & (prior_samples < 0.05)
    
    if np.sum(zero_region) > 0 and np.sum(prior_zero_region) > 0:
        posterior_density_at_zero = np.mean(zero_region) / 0.1  # Normalize by region width (0.1)
        prior_density_at_zero = np.mean(prior_zero_region) / 0.1
        bf01 = posterior_density_at_zero / prior_density_at_zero
        bf10 = 1 / bf01
    else:
        # Default to a simple calculation based on data
        bf10 = np.exp(n/2 * np.log(1 + r**2/(1-r**2)))
    
    return rho_samples, bf10

# Example: Testing correlation between therapy attendance and outcome improvement
np.random.seed(42)
n_patients = 40

# Generate correlated data
true_rho = 0.4  # Moderate positive correlation
cov_matrix = np.array([[1, true_rho], [true_rho, 1]])
data = np.random.multivariate_normal([0, 0], cov_matrix, n_patients)

attendance = data[:, 0]  # Number of therapy sessions attended
improvement = data[:, 1]  # Improvement in symptoms

# Run Bayesian correlation test
rho_samples, bf10 = bayesian_correlation_test(attendance, improvement)

# Plot results
plt.figure(figsize=(12, 6))

# Plot scatterplot of data
plt.subplot(1, 2, 1)
plt.scatter(attendance, improvement)
plt.xlabel('Therapy Attendance')
plt.ylabel('Symptom Improvement')
plt.title('Scatterplot of Data')
plt.grid(True, alpha=0.3)

# Add regression line
slope, intercept = np.polyfit(attendance, improvement, 1)
x_line = np.array([min(attendance), max(attendance)])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--')

# Plot posterior distribution
plt.subplot(1, 2, 2)
sns.histplot(rho_samples, kde=True)
plt.axvline(x=0, color='red', linestyle='--')
plt.axvline(x=np.mean(rho_samples), color='blue', linestyle='-')

# Add 95% credible interval
ci_lower = np.percentile(rho_samples, 2.5)
ci_upper = np.percentile(rho_samples, 97.5)
plt.axvline(x=ci_lower, color='green', linestyle=':')
plt.axvline(x=ci_upper, color='green', linestyle=':')

plt.title(f'Posterior Distribution of Correlation\nBF₁₀ = {bf10:.2f}')
plt.xlabel('Correlation Coefficient (ρ)')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Estimated correlation: {np.mean(rho_samples):.3f}")
print(f"95% Credible Interval: [{ci_lower:.3f}, {ci_upper:.3f}]")
print(f"Probability correlation > 0: {np.mean(rho_samples > 0):.3f}")
print(f"Bayes Factor (BF₁₀): {bf10:.2f}")

# # Example: Predicting well-being from different psychological factors
# # Generate some synthetic data
# np.random.seed(42)
# n_subjects = 100

# # Predictors: social support, stress, physical activity, sleep quality
# X_full = np.random.normal(0, 1, (n_subjects, 4))

# # True relationship: well-being = intercept + social support - stress + noise
# true_intercept = 5
# true_betas = [0.5, -0.7, 0.1, 0.0]  # Only first two have substantial effects
# y = true_intercept + np.dot(X_full, true_betas) + np.random.normal(0, 1, n_subjects)

# # Define models to compare
# model_specs = [
#     [],                  # Model 1: Intercept only
#     [0],                 # Model 2: Social support only
#     [1],                 # Model 3: Stress only
#     [0, 1],              # Model 4: Social support + stress
#     [0, 1, 2],           # Model 5: Social support + stress + physical activity
#     [0, 1, 2, 3]         # Model 6: All predictors
# ]

# # Compare models
# model_comparison = bayesian_model_comparison(X_full, y, model_specs)

# # Display results
# results_df = pd.DataFrame({
#     'Model': model_comparison['model_names'],
#     'WAIC': model_comparison['waic'],
#     'WAIC Weight': model_comparison['waic_weights'],
#     'LOO': model_comparison['loo'],
#     'LOO Weight': model_comparison['loo_weights']
# })

# print(results_df)

# # Plot model comparison
# plt.figure(figsize=(12, 6))
# plt.subplot(1, 2, 1)
# plt.bar(results_df['Model'], results_df['WAIC Weight'])
# plt.title('Model Comparison: WAIC Weights')
# plt.ylabel('Weight')
# plt.ylim(0, 1)

# plt.subplot(1, 2, 2)
# plt.bar(results_df['Model'], results_df['LOO Weight'])
# plt.title('Model Comparison: LOO Weights')
# plt.ylabel('Weight')
# plt.ylim(0, 1)

# plt.tight_layout()
# plt.show()

Estimated correlation: 0.217
95% Credible Interval: [-0.096, 0.502]
Probability correlation > 0: 0.915
Bayes Factor (BF₁₀): 0.55

4.3 Bayesian Model Averaging#

Rather than selecting a single “best” model, Bayesian model averaging (BMA) combines predictions from multiple models, weighted by their posterior probabilities. This approach acknowledges model uncertainty and often leads to better predictive performance.

The posterior predictive distribution under BMA is:

\[p(y_{new} | y) = \sum_{k=1}^{K} p(y_{new} | M_k, y) \cdot p(M_k | y)\]

Where:

$p(y_{new} | y)$ is the posterior predictive distribution for new data
$p(y_{new} | M_k, y)$ is the posterior predictive distribution under model $M_k$
$p(M_k | y)$ is the posterior probability of model $M_k$ Let’s implement Bayesian model averaging for our well-being prediction example:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import pymc as pm
import arviz as az

def bayesian_correlation_test(x, y, prior_width=1, n_samples=10000):
    """
    Perform a Bayesian test for correlation between two variables.
    
    Parameters:
    -----------
    x : array-like
        First variable
    y : array-like
        Second variable
    prior_width : float
        Width parameter for the prior distribution on correlation
    n_samples : int
        Number of posterior samples to generate
        
    Returns:
    --------
    rho_samples : array
        Posterior samples of the correlation coefficient
    bf10 : float
        Bayes factor in favor of the alternative hypothesis
    """
    # Calculate sample correlation
    r = np.corrcoef(x, y)[0, 1]
    n = len(x)
    
    # Use Fisher's z-transformation for more numerical stability
    z = np.arctanh(r) if abs(r) < 1 else np.sign(r) * 10  # Handle r = ±1
    sigma = 1.0 / np.sqrt(n - 3)
    
    # Generate posterior samples directly using z-transformation
    z_samples = np.random.normal(z, sigma, n_samples)
    rho_samples = np.tanh(z_samples)
    
    # Define prior distribution function
    def prior(rho, width=prior_width):
        """Stretched beta prior on correlation coefficient."""
        if abs(rho) > 1:
            return 0
        return ((1 - rho**2) ** 0.5) / (2 * width) * (abs(rho) < width) + \
               ((1 - rho**2) ** 0.5) / (2 * (1 - width)) * (abs(rho) >= width)
    
    # Calculate Bayes factor using Monte Carlo integration
    prior_samples = np.random.uniform(-0.99, 0.99, n_samples)
    prior_densities = np.array([prior(rho) for rho in prior_samples])
    
    # Calculate marginal likelihood ratio (BF10)
    # Approximate the densities around zero
    zero_region = (rho_samples > -0.05) & (rho_samples < 0.05)
    prior_zero_region = (prior_samples > -0.05) & (prior_samples < 0.05)
    
    if np.sum(zero_region) > 0 and np.sum(prior_zero_region) > 0:
        posterior_density_at_zero = np.mean(zero_region) / 0.1  # Normalize by region width (0.1)
        prior_density_at_zero = np.mean(prior_zero_region) / 0.1
        bf01 = posterior_density_at_zero / prior_density_at_zero
        bf10 = 1 / bf01
    else:
        # Default to a simple calculation based on data
        bf10 = np.exp(n/2 * np.log(1 + r**2/(1-r**2)))
    
    return rho_samples, bf10

def create_model(X, y, predictors):
    """Create a PyMC model with the specified predictors."""
    with pm.Model() as model:
        # Add predictors to design matrix
        if len(predictors) > 0:
            X_model = np.column_stack([np.ones(X.shape[0]), X[:, predictors]])
        else:
            X_model = np.ones((X.shape[0], 1))
        
        # Priors
        intercept = pm.Normal('intercept', mu=0, sigma=10)
        
        if X_model.shape[1] > 1:  # If not intercept-only
            betas = pm.Normal('betas', mu=0, sigma=1, shape=X_model.shape[1]-1)
            mu = intercept + pm.math.dot(X_model[:, 1:], betas)
        else:
            mu = intercept
        
        # Likelihood
        sigma = pm.HalfNormal('sigma', sigma=1)
        likelihood = pm.Normal('likelihood', mu=mu, sigma=sigma, observed=y)
        
        # Log likelihood for model comparison
        log_likelihood = pm.Deterministic('log_likelihood', likelihood.logp())
    
    return model

def bayesian_model_comparison(X, y, model_specs, n_samples=1000):
    """
    Compare Bayesian linear regression models.
    
    Parameters:
    -----------
    X : array
        Design matrix
    y : array
        Outcome variable
    model_specs : list of lists
        Each inner list contains the indices of predictors to include in a model
    n_samples : int
        Number of posterior samples to generate
    
    Returns:
    --------
    dict
        Dictionary containing model comparison results
    """
    n_models = len(model_specs)
    
    # Initialize arrays for model comparison metrics
    model_names = []
    traces = []
    waics = np.zeros(n_models)
    loos = np.zeros(n_models)
    
    # Fit each model and calculate metrics
    for i, predictors in enumerate(model_specs):
        # Create model name
        if len(predictors) == 0:
            model_name = "Intercept only"
        else:
            predictors_str = ', '.join([f"X{j+1}" for j in predictors])
            model_name = f"Model with {predictors_str}"
        model_names.append(model_name)
        
        # Create and fit model
        with create_model(X, y, predictors):
            trace = pm.sample(n_samples, return_inferencedata=True, chains=2)
        
        traces.append(trace)
        
        # Calculate WAIC and LOO
        waic = az.waic(trace, pointwise=True)
        loo = az.loo(trace, pointwise=True)
        
        waics[i] = waic.waic
        loos[i] = loo.loo
    
    # Calculate weights
    waic_weights = np.exp(-0.5 * (waics - np.min(waics)))
    waic_weights = waic_weights / np.sum(waic_weights)
    
    loo_weights = np.exp(-0.5 * (loos - np.min(loos)))
    loo_weights = loo_weights / np.sum(loo_weights)
    
    return {
        'model_names': model_names,
        'traces': traces,
        'waic': waics,
        'waic_weights': waic_weights,
        'loo': loos,
        'loo_weights': loo_weights
    }

def bayesian_model_averaging(X_train, y_train, X_test, model_specs, model_weights, traces):
    """
    Perform Bayesian model averaging for prediction.
    
    Parameters:
    -----------
    X_train : array
        Training data design matrix
    y_train : array
        Training data outcome variable
    X_test : array
        Test data design matrix
    model_specs : list of lists
        Each inner list contains the indices of predictors to include in a model
    model_weights : array
        Posterior model probabilities (e.g., from WAIC or LOO weights)
    traces : list
        List of PyMC traces for each model
        
    Returns:
    --------
    y_pred_mean : array
        BMA predicted means
    y_pred_std : array
        BMA predicted standard deviations
    """
    n_models = len(model_specs)
    n_test = X_test.shape[0]
    n_samples = 1000  # Number of posterior samples to use
    
    # Initialize arrays for predictions
    all_model_preds = np.zeros((n_models, n_test, n_samples))
    
    # Get predictions from each model
    for i, (predictors, trace) in enumerate(zip(model_specs, traces)):
        # Select predictors for this model
        if len(predictors) > 0:
            X_model_test = np.column_stack([np.ones(n_test), X_test[:, predictors]])
        else:
            # Intercept-only model
            X_model_test = np.ones((n_test, 1))
        
        # Extract posterior samples
        intercept_samples = trace.posterior['intercept'].values.flatten()[:n_samples]
        
        if X_model_test.shape[1] > 1:  # If not intercept-only
            beta_samples = trace.posterior['betas'].values.reshape(-1, X_model_test.shape[1] - 1)[:n_samples]
            
            # For each posterior sample, make predictions
            for j in range(len(intercept_samples)):
                all_model_preds[i, :, j] = intercept_samples[j] + np.dot(X_model_test[:, 1:], beta_samples[j])
        else:
            # Intercept-only model
            for j in range(len(intercept_samples)):
                all_model_preds[i, :, j] = intercept_samples[j]
    
    # Combine predictions using model weights
    weighted_preds = np.zeros((n_test, n_samples))
    
    for i in range(n_models):
        weighted_preds += model_weights[i] * all_model_preds[i]
    
    # Calculate mean and standard deviation of predictions
    y_pred_mean = np.mean(weighted_preds, axis=1)
    y_pred_std = np.std(weighted_preds, axis=1)
    
    return y_pred_mean, y_pred_std

# Example: Testing correlation between therapy attendance and outcome improvement
np.random.seed(42)
n_patients = 40

# Generate correlated data
true_rho = 0.4  # Moderate positive correlation
cov_matrix = np.array([[1, true_rho], [true_rho, 1]])
data = np.random.multivariate_normal([0, 0], cov_matrix, n_patients)

attendance = data[:, 0]  # Number of therapy sessions attended
improvement = data[:, 1]  # Improvement in symptoms

# Run Bayesian correlation test
rho_samples, bf10 = bayesian_correlation_test(attendance, improvement)

# Plot results
plt.figure(figsize=(12, 6))

# Plot scatterplot of data
plt.subplot(1, 2, 1)
plt.scatter(attendance, improvement)
plt.xlabel('Therapy Attendance')
plt.ylabel('Symptom Improvement')
plt.title('Scatterplot of Data')
plt.grid(True, alpha=0.3)

# Add regression line
slope, intercept = np.polyfit(attendance, improvement, 1)
x_line = np.array([min(attendance), max(attendance)])
y_line = slope * x_line + intercept
plt.plot(x_line, y_line, 'r--')

# Plot posterior distribution
plt.subplot(1, 2, 2)
sns.histplot(rho_samples, kde=True)
plt.axvline(x=0, color='red', linestyle='--')
plt.axvline(x=np.mean(rho_samples), color='blue', linestyle='-')

# Add 95% credible interval
ci_lower = np.percentile(rho_samples, 2.5)
ci_upper = np.percentile(rho_samples, 97.5)
plt.axvline(x=ci_lower, color='green', linestyle=':')
plt.axvline(x=ci_upper, color='green', linestyle=':')

plt.title(f'Posterior Distribution of Correlation\nBF₁₀ = {bf10:.2f}')
plt.xlabel('Correlation Coefficient (ρ)')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Estimated correlation: {np.mean(rho_samples):.3f}")
print(f"95% Credible Interval: [{ci_lower:.3f}, {ci_upper:.3f}]")
print(f"Probability correlation > 0: {np.mean(rho_samples > 0):.3f}")
print(f"Bayes Factor (BF₁₀): {bf10:.2f}")

# Example: Predicting well-being from different psychological factors
# Generate some synthetic data
np.random.seed(42)
n_subjects = 100

# Predictors: social support, stress, physical activity, sleep quality
X_full = np.random.normal(0, 1, (n_subjects, 4))

# True relationship: well-being = intercept + social support - stress + noise
true_intercept = 5
true_betas = [0.5, -0.7, 0.1, 0.0]  # Only first two have substantial effects
y = true_intercept + np.dot(X_full, true_betas) + np.random.normal(0, 1, n_subjects)

# Define models to compare
model_specs = [
    [],                  # Model 1: Intercept only
    [0],                 # Model 2: Social support only
    [1],                 # Model 3: Stress only
    [0, 1],              # Model 4: Social support + stress
    [0, 1, 2],           # Model 5: Social support + stress + physical activity
    [0, 1, 2, 3]         # Model 6: All predictors
]

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_full, y, test_size=0.3, random_state=42)

# Perform model comparison on training data
train_comparison = bayesian_model_comparison(X_train, y_train, model_specs)

# Use LOO weights for model averaging
loo_weights = train_comparison['loo_weights']
traces = train_comparison['traces']

# Perform Bayesian model averaging
y_pred_mean, y_pred_std = bayesian_model_averaging(
    X_train, y_train, X_test, model_specs, loo_weights, traces
)

# Calculate prediction metrics
mse = mean_squared_error(y_test, y_pred_mean)
r2 = r2_score(y_test, y_pred_mean)

print(f"Mean Squared Error: {mse:.3f}")
print(f"R² Score: {r2:.3f}")

# Plot predictions vs. actual values
plt.figure(figsize=(10, 6))
plt.errorbar(y_test, y_pred_mean, yerr=y_pred_std, fmt='o', alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--')
plt.xlabel('Actual Well-being')
plt.ylabel('Predicted Well-being')
plt.title('Bayesian Model Averaging: Predictions with Uncertainty')
plt.grid(True)
plt.show()

# Display results
results_df = pd.DataFrame({
    'Model': train_comparison['model_names'],
    'WAIC': train_comparison['waic'],
    'WAIC Weight': train_comparison['waic_weights'],
    'LOO': train_comparison['loo'],
    'LOO Weight': train_comparison['loo_weights']
})

print(results_df)

# Plot model comparison
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.bar(results_df['Model'], results_df['WAIC Weight'])
plt.title('Model Comparison: WAIC Weights')
plt.ylabel('Weight')
plt.ylim(0, 1)

plt.subplot(1, 2, 2)
plt.bar(results_df['Model'], results_df['LOO Weight'])
plt.title('Model Comparison: LOO Weights')
plt.ylabel('Weight')
plt.ylim(0, 1)

plt.tight_layout()
plt.show()

Estimated correlation: 0.217
95% Credible Interval: [-0.096, 0.502]
Probability correlation > 0: 0.915
Bayes Factor (BF₁₀): 0.55

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 320
    317 X_train, X_test, y_train, y_test = train_test_split(X_full, y, test_size=0.3, random_state=42)
    319 # Perform model comparison on training data
--> 320 train_comparison = bayesian_model_comparison(X_train, y_train, model_specs)
    322 # Use LOO weights for model averaging
    323 loo_weights = train_comparison['loo_weights']

Cell In[16], line 139, in bayesian_model_comparison(X, y, model_specs, n_samples)
    136 model_names.append(model_name)
    138 # Create and fit model
--> 139 with create_model(X, y, predictors):
    140     trace = pm.sample(n_samples, return_inferencedata=True, chains=2)
    142 traces.append(trace)

Cell In[16], line 96, in create_model(X, y, predictors)
     93     likelihood = pm.Normal('likelihood', mu=mu, sigma=sigma, observed=y)
     95     # Log likelihood for model comparison
---> 96     log_likelihood = pm.Deterministic('log_likelihood', likelihood.logp())
     98 return model

AttributeError: 'TensorVariable' object has no attribute 'logp'

5.2 Bayesian Models of Learning#

Bayesian models can also describe how humans learn from experience. Let’s implement a simple Bayesian model of associative learning:

def bayesian_associative_learning(outcomes, prior_alpha=1, prior_beta=1):
    """
    Simulate Bayesian associative learning with a Beta-Bernoulli model.
    
    Parameters:
    -----------
    outcomes : array
        Binary outcomes (0 or 1) observed over time
    prior_alpha : float
        Alpha parameter of the Beta prior
    prior_beta : float
        Beta parameter of the Beta prior
        
    Returns:
    --------
    posterior_means : array
        Posterior mean estimates after each observation
    posterior_stds : array
        Posterior standard deviations after each observation
    """
    n_trials = len(outcomes)
    posterior_means = np.zeros(n_trials)
    posterior_stds = np.zeros(n_trials)
    
    # Initialize posterior parameters with prior
    alpha = prior_alpha
    beta = prior_beta
    
    for t in range(n_trials):
        # Update posterior based on observed outcome
        if outcomes[t] == 1:
            alpha += 1
        else:
            beta += 1
        
        # Calculate posterior mean and standard deviation
        posterior_means[t] = alpha / (alpha + beta)
        posterior_stds[t] = np.sqrt((alpha * beta) / ((alpha + beta)**2 * (alpha + beta + 1)))
    
    return posterior_means, posterior_stds

# Example: Learning the probability of reward in a conditioning experiment
np.random.seed(42)
true_prob = 0.7  # True probability of reward
n_trials = 50
outcomes = np.random.binomial(1, true_prob, n_trials)

# Simulate Bayesian learning
posterior_means, posterior_stds = bayesian_associative_learning(outcomes)

# Plot learning curve
plt.figure(figsize=(10, 6))
plt.plot(range(1, n_trials+1), posterior_means, 'b-', label='Posterior mean')
plt.fill_between(
    range(1, n_trials+1),
    posterior_means - posterior_stds,
    posterior_means + posterior_stds,
    alpha=0.3, color='blue'
)
plt.axhline(y=true_prob, color='r', linestyle='--', label='True probability')
plt.xlabel('Trial')
plt.ylabel('Estimated Probability')
plt.title('Bayesian Associative Learning')
plt.legend()
plt.grid(True)
plt.show()

_images/d92e4aa22d679bd420c8de83394610893cf92e9705bed7d54c7b53eaf66470ff.png

6. Practical Considerations in Bayesian Analysis#

6.1 Prior Selection#

The choice of prior distributions is a critical aspect of Bayesian analysis. Let’s explore how different priors affect inference:

# def compare_priors(data, prior_specs, n_samples=2000):
#     """
#     Compare how different priors affect posterior inference.
    
#     Parameters:
#     -----------
#     data : array
#         Observed data
#     prior_specs : list of tuples
#         Each tuple contains (prior_name, prior_distribution)
#     n_samples : int
#         Number of posterior samples
        
#     Returns:
#     --------
#     posterior_samples : dict
#         Dictionary with posterior samples for each prior
#     """
#     posterior_samples = {}
    
#     for prior_name, prior_dist in prior_specs:
#         with pm.Model() as model:
#             # Set up the prior
#             theta = prior_dist
            
#             # Likelihood
#             y = pm.Normal('y', mu=theta, sigma=1, observed=data)
            
#             # Sample from posterior
#             trace = pm.sample(n_samples, return_inferencedata=True)
            
#             # Store posterior samples
#             posterior_samples[prior_name] = trace.posterior['theta'].values.flatten()
    
#     return posterior_samples

# # Example: Effect of different priors on estimating the mean of a normal distribution
# np.random.seed(42)
# true_mean = 2.5
# data = np.random.normal(true_mean, 1, 20)  # Small sample size to highlight prior influence

# # Define different priors
# prior_specs = [
#     ('Uninformative', pm.Normal('theta', mu=0, sigma=10)),
#     ('Weakly informative', pm.Normal('theta', mu=0, sigma=3)),
#     ('Informative (correct)', pm.Normal('theta', mu=2, sigma=1)),
#     ('Informative (incorrect)', pm.Normal('theta', mu=5, sigma=1))
# ]

# # Compare priors
# posterior_samples = compare_priors(data, prior_specs)

# # Plot posterior distributions
# plt.figure(figsize=(12, 6))
# for prior_name, samples in posterior_samples.items():
#     sns.kdeplot(samples, label=prior_name)

# plt.axvline(x=true_mean, color='r', linestyle='--', label='True mean')
# plt.axvline(x=np.mean(data), color='k', linestyle=':', label='Sample mean')
# plt.xlabel('Mean')
# plt.ylabel('Density')
# plt.title('Effect of Prior Choice on Posterior Distribution')
# plt.legend()
# plt.grid(True)
# plt.show()

6.2 Convergence Diagnostics#

Ensuring MCMC convergence is essential for reliable Bayesian inference. Let’s implement some convergence diagnostics:

# def convergence_diagnostics(trace, parameter_name):
#     """
#     Calculate and visualize convergence diagnostics for MCMC samples.
    
#     Parameters:
#     -----------
#     trace : InferenceData
#         PyMC trace object
#     parameter_name : str
#         Name of the parameter to diagnose
        
#     Returns:
#     --------
#     diagnostics : dict
#         Dictionary with convergence metrics
#     """
#     # Extract parameter samples
#     param_samples = trace.posterior[parameter_name].values
    
#     # Calculate Gelman-Rubin statistic
#     gr_stat = pm.rhat(trace, var_names=[parameter_name]).to_array().values.flatten()[0]
    
#     # Calculate effective sample size
#     ess = pm.ess(trace, var_names=[parameter_name]).to_array().values.flatten()[0]
    
#     # Compile diagnostics
#     diagnostics = {
#         'rhat': gr_stat,
#         'ess': ess
#     }
    
#     # Plot trace and autocorrelation
#     plt.figure(figsize=(12, 8))
    
#     # Trace plot
#     plt.subplot(2, 1, 1)
#     for chain in range(param_samples.shape[0]):
#         plt.plot(param_samples[chain, :, 0], alpha=0.7)
#     plt.title(f'Trace Plot for {parameter_name}')
#     plt.xlabel('Iteration')
#     plt.ylabel('Parameter Value')
    
#     # Autocorrelation plot
#     plt.subplot(2, 1, 2)
#     for chain in range(param_samples.shape[0]):
#         acf = pm.autocorrplot(param_samples[chain, :, 0], max_lag=100)
#     plt.title(f'Autocorrelation for {parameter_name}')
    
#     plt.tight_layout()
#     plt.show()
    
#     return diagnostics

# # Example: Diagnose convergence for a simple model
# with pm.Model() as model:
#     mu = pm.Normal('mu', mu=0, sigma=1)
#     sigma = pm.HalfCauchy('sigma', beta=1)
#     y = pm.Normal('y', mu=mu, sigma=sigma, observed=np.random.normal(0, 1, 100))
    
#     # Sample with multiple chains
#     trace = pm.sample(1000, chains=4, return_inferencedata=True)

# # Check convergence for mu parameter
# diagnostics = convergence_diagnostics(trace, 'mu')
# print(f"R-hat: {diagnostics['rhat']:.3f}")
# print(f"Effective Sample Size: {diagnostics['ess']:.1f}")

7. Conclusion#

Bayesian statistics offers a powerful framework for psychological research, providing principled ways to incorporate prior knowledge, quantify uncertainty, and make probabilistic inferences. In this chapter, we’ve explored the fundamentals of Bayesian analysis and demonstrated its application to various psychological problems.

Key advantages of the Bayesian approach include:

Incorporation of prior knowledge : Bayesian methods allow researchers to formally include existing knowledge in their analyses.
Full uncertainty quantification : Rather than point estimates, Bayesian analysis provides entire probability distributions.
Intuitive interpretation : Posterior probabilities have direct interpretations about our beliefs given the data.
Flexibility in model specification : Complex hierarchical models can be specified and estimated.
Natural handling of small samples : Bayesian methods can provide reasonable inferences even with limited data. As computational tools for Bayesian analysis continue to improve, these methods are becoming increasingly accessible to psychological researchers. By understanding the principles and practices of Bayesian statistics, psychologists can enhance their analytical toolkit and gain deeper insights into psychological phenomena.

Chapter 19: Bayesian Statistics

Contents

Chapter 19: Bayesian Statistics#

1. Introduction to Bayesian Statistics#

1.1 Bayes’ Theorem: The Foundation#

1.2 The Bayesian Framework in Psychology#

2. Bayesian Parameter Estimation#

2.1 Conjugate Priors#

2.2 Non-Conjugate Priors and Numerical Methods#

3. Markov Chain Monte Carlo (MCMC)#

3.1 Metropolis-Hastings Algorithm#

4. Bayesian Hypothesis Testing#

4.1 Bayes Factors#

4.2 Bayesian Estimation and Hypothesis Testing in Psychology#

4.2 Bayesian Estimation#

The Bayesian Approach to Estimation#

Bayesian Estimation vs. Frequentist Estimation#

Choosing Priors#

Example: Estimating the Success Rate of a Therapy#

Hierarchical Bayesian Models#

Bayesian Regression Models#

Advantages of Bayesian Estimation in Psychology#

4.3 Bayesian Hypothesis Testing#

The Bayes Factor#

Bayesian Model Comparison#

4.3 Bayesian Model Averaging#

5.2 Bayesian Models of Learning#

6. Practical Considerations in Bayesian Analysis#

6.1 Prior Selection#

6.2 Convergence Diagnostics#

7. Conclusion#