---
title: "Permutation tests for mixed-effects models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Permutation tests for mixed-effects models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(ptestR)
```

## Why `lmer` doesn't give p-values

`lme4::lmer()` deliberately omits p-values from its output. The reason is
principled: in mixed-effects models, the degrees of freedom for fixed effects
are not well-defined. Approximations exist (Satterthwaite, Kenward-Roger), but
they rely on assumptions about the variance-covariance structure that may not
hold in small or unbalanced datasets.

`grouped_perm_glmm()` sidesteps this entirely by computing `p.perm` — a fully
nonparametric p-value based on permuting the outcome variable and refitting the
model. No degree-of-freedom approximation is needed.

---

## When to use `grouped_perm_glmm()`

Use `grouped_perm_glmm()` when your data have:

- **Repeated measures** — the same subject observed at multiple time points or
  conditions
- **Hierarchical structure** — participants nested within sites, scans nested
  within participants, and so on
- **Small samples** — fewer than ~50 subjects, where asymptotic approximations
  are unreliable

---

## A worked example

We simulate a repeated-measures dataset with 20 subjects, each observed 4
times. The outcome depends on a continuous fixed effect and a binary group
variable, with random intercepts per subject.

```{r simulate}
set.seed(42)
n_subj  <- 20
n_obs   <- 4

df <- data.frame(
  subject = rep(seq_len(n_subj), each = n_obs),
  time    = rep(seq_len(n_obs), times = n_subj),
  group   = rep(c("control", "case"), each = n_subj * n_obs / 2),
  y       = NA_real_
)

# Random intercepts + fixed effects
random_int  <- rep(rnorm(n_subj, sd = 1.2), each = n_obs)
df$y <- 0.8 * (df$group == "case") +
        0.3 * df$time +
        random_int +
        rnorm(nrow(df), sd = 0.8)

head(df, 8)
```

Fit the permutation GLMM with 999 permutations:

```{r glmm}
res <- grouped_perm_glmm(
  tbl         = df,
  formla      = y ~ group + time + (1 | subject),
  var_to_perm = "y",
  permNum     = 999,
  seed        = 42
)
res
```

The result contains only fixed-effect rows (random-effect parameters are
filtered out). Columns are the same as `grouped_perm_glm()` except `p.value`
is absent and `effect` is always `"fixed"`.

---

## Comparing to `lme4` output directly

It is instructive to compare the permutation p-value against `lmerTest`'s
Satterthwaite approximation:

```{r compare, eval = requireNamespace("lmerTest", quietly = TRUE)}
library(lmerTest)
fit <- lmer(y ~ group + time + (1 | subject), data = df)
summary(fit)$coefficients
```

The t-statistics will match exactly — ptestR fits the same model. Only the
p-values differ: `p.perm` uses the empirical null distribution, while
`lmerTest` uses an approximate t-distribution with estimated degrees of
freedom.

---

## Random effects structure

ptestR permutes only the outcome variable. The random-effects structure
(grouping factors, random slopes) is held fixed. This means:

- Permutations are exchangeable at the observation level
- The between-subject variance is preserved across permutations
- The test is valid under exchangeability of residuals, which is a weaker
  assumption than normality

For more complex designs (e.g. cross-classified random effects, or permutation
within clusters), manual permutation strategies may be needed — ptestR does not
currently support restricted permutation schemes.

---

## Random slopes

The formula interface is passed directly to `lme4::lmer()`, so random slopes
work as expected:

```{r random-slopes}
res_slopes <- grouped_perm_glmm(
  tbl         = df,
  formla      = y ~ group + time + (1 + time | subject),
  var_to_perm = "y",
  permNum     = 499,
  seed        = 42
)
res_slopes
```

Note that more complex random-effects structures increase computation time
proportionally with `permNum`.

---

## Convergence warnings

With small datasets and complex random-effects structures, some permuted fits
may produce convergence warnings from `lme4`. These are expected and do not
affect the validity of `p.perm` — the permuted t-statistics from borderline
fits still contribute to the null distribution. To suppress these warnings
during bulk permutation runs:

```{r suppress, eval = FALSE}
suppressWarnings(
  grouped_perm_glmm(df, y ~ group + (1 | subject), "y", permNum = 999)
)
```

---

## Computational cost

Each permutation refits the full model. For a dataset of 80 observations with
a random intercept, 1 000 permutations typically takes a few seconds. For
10 000 permutations or larger datasets, consider parallelising across features
rather than within a single call — see `vignette("grouped-analysis")`.

---

## Further reading

- `vignette("getting-started")` — `grouped_perm_glm()` and core concepts
- `vignette("grouped-analysis")` — applying permutation tests across many
  features using `dplyr::group_by()` and `purrr::map()`
- `?grouped_perm_glmm`