---
title: "Getting started with ptestR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with ptestR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(ptestR)
```

## Why permutation tests?

Standard regression relies on asymptotic theory: p-values from Wald tests or
likelihood-ratio tests are valid when sample sizes are large and residuals are
approximately normal. In neuroscience and biomedical research, these conditions
are often not met — datasets are small, outcomes are skewed, and repeated
measures introduce complex dependency structures.

Permutation tests sidestep these assumptions entirely. Instead of comparing an
observed statistic to a theoretical distribution, ptestR builds the null
distribution empirically: it randomly rearranges the outcome variable many
times, fits the model to each permuted dataset, and asks how often the
permuted statistic is at least as extreme as the one we actually observed.

$p_\text{perm} = \frac{\lvert\{|t_\text{perm}| \geq |t_\text{obs}|\}\rvert}{B}$

where $B$ is the number of permutations. This value requires no distributional
assumptions and is exact in the sense that it converges to the true p-value as
$B \to \infty$.

---

## The three functions

ptestR exports one function per model class, all sharing the same call
signature:

| Function | Model | Test statistic |
|---|---|---|
| `grouped_perm_glm()` | Generalised linear model (`glm`) | t |
| `grouped_perm_glmm()` | Linear mixed-effects model (`lmer`) | t |
| `grouped_perm_binoglm()` | Binomial logistic regression | z |

This vignette covers `grouped_perm_glm()`. See
`vignette("mixed-effects-models")` for `grouped_perm_glmm()` and
`vignette("grouped-analysis")` for the full nested workflow used in practice.

---

## A simple example

We simulate a small dataset with a continuous outcome, a continuous predictor,
and a three-level grouping factor, then compare the asymptotic and permutation
p-values.

```{r sim-data}
set.seed(42)
n  <- 30
df <- data.frame(
  outcome   = 2.5 * rnorm(n) + c(0, 1, 2)[rep(1:3, each = 10)],
  predictor = rnorm(n),
  group     = factor(rep(c("A", "B", "C"), each = 10))
)
head(df)
```

Fit a permutation GLM with 999 permutations:

```{r glm-basic}
res <- grouped_perm_glm(
  tbl         = df,
  formla      = outcome ~ predictor + group,
  var_to_perm = "outcome",
  permNum     = 999,
  seed        = 42
)
res
```

The result is a tidy tibble with one row per model term. The key columns are:

| Column | Description |
|---|---|
| `term` | Regression term name |
| `estimate` | Fitted coefficient |
| `statistic` | Observed t-statistic |
| `p.value` | Asymptotic p-value from `glm` |
| `p.perm` | Permutation p-value |

---

## Interpreting `p.perm`

A `p.perm` of `0` does not mean the true p-value is zero — it means none of
the `permNum` permuted statistics were as extreme as the observed one. Report
it as:

```
p < 1/permNum   (e.g. p < 0.001 for permNum = 1000)
```

The precision of `p.perm` depends on `permNum`. For exploratory work, 999 or
1 000 permutations are reasonable. For publication, 9 999 or 10 000 give
better resolution, especially for small p-values.

```{r precision}
# How many permutations do you need to resolve p = 0.05 with 10% relative error?
# Rule of thumb: permNum >= 10 / alpha
# For alpha = 0.05: permNum >= 200 (minimum); 1000+ recommended
# For alpha = 0.001: permNum >= 10000
```

---

## The `family` argument

`grouped_perm_glm()` wraps `stats::glm()`, so you can pass any family. The
default is `gaussian` (ordinary least squares). For count data you might use
`poisson`:

```{r poisson}
set.seed(1)
counts_df <- data.frame(
  y = rpois(20, lambda = exp(0.5 * rnorm(20))),
  x = rnorm(20)
)

grouped_perm_glm(
  tbl         = counts_df,
  formla      = y ~ x,
  var_to_perm = "y",
  family      = poisson,
  permNum     = 499,
  seed        = 1
)
```

---

## Binomial logistic regression

For a binary outcome, use `grouped_perm_binoglm()` instead of passing
`family = binomial` to `grouped_perm_glm()`. It handles the binary outcome
correctly and returns z-statistics:

```{r binomial}
set.seed(7)
bin_df <- data.frame(
  outcome = rbinom(40, 1, prob = plogis(-0.5 + 0.8 * rnorm(40))),
  x1      = rnorm(40),
  x2      = rnorm(40)
)

grouped_perm_binoglm(
  tbl         = bin_df,
  formla      = outcome ~ x1 + x2,
  var_to_perm = "outcome",
  permNum     = 499,
  seed        = 7
)
```

---

## Reproducibility

All three functions accept a `seed` argument passed to `base::set.seed()`.
Always set this in published analyses so results are exactly reproducible:

```{r reproducibility}
r1 <- grouped_perm_glm(df, outcome ~ predictor + group, "outcome",
                       permNum = 199, seed = 123)
r2 <- grouped_perm_glm(df, outcome ~ predictor + group, "outcome",
                       permNum = 199, seed = 123)

identical(r1$p.perm, r2$p.perm)
```

---

## Multiple comparisons

`p.perm` values are raw — no multiple testing correction is applied
automatically. When running models across many features or brain regions, apply
FDR correction after collecting all results:

```{r fdr, eval = FALSE}
# After collecting results from many models into a single data frame `all_res`:
all_res$p.fdr <- p.adjust(all_res$p.perm, method = "fdr")
```

This matches the approach used in França et al. (2024), where 10 000
permutations and FDR correction at α = 5% were applied across brain state
features in 390 neonates.

---

## Further reading

- `vignette("mixed-effects-models")` — using `grouped_perm_glmm()` with `lmer`
- `vignette("grouped-analysis")` — nesting models across features or states
- `?grouped_perm_glm`, `?grouped_perm_glmm`, `?grouped_perm_binoglm`