--- title: "Getting started with ptestR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with ptestR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(ptestR) ``` ## Why permutation tests? Standard regression relies on asymptotic theory: p-values from Wald tests or likelihood-ratio tests are valid when sample sizes are large and residuals are approximately normal. In neuroscience and biomedical research, these conditions are often not met — datasets are small, outcomes are skewed, and repeated measures introduce complex dependency structures. Permutation tests sidestep these assumptions entirely. Instead of comparing an observed statistic to a theoretical distribution, ptestR builds the null distribution empirically: it randomly rearranges the outcome variable many times, fits the model to each permuted dataset, and asks how often the permuted statistic is at least as extreme as the one we actually observed. $p_\text{perm} = \frac{\lvert\{|t_\text{perm}| \geq |t_\text{obs}|\}\rvert}{B}$ where $B$ is the number of permutations. This value requires no distributional assumptions and is exact in the sense that it converges to the true p-value as $B \to \infty$. --- ## The three functions ptestR exports one function per model class, all sharing the same call signature: | Function | Model | Test statistic | |---|---|---| | `grouped_perm_glm()` | Generalised linear model (`glm`) | t | | `grouped_perm_glmm()` | Linear mixed-effects model (`lmer`) | t | | `grouped_perm_binoglm()` | Binomial logistic regression | z | This vignette covers `grouped_perm_glm()`. See `vignette("mixed-effects-models")` for `grouped_perm_glmm()` and `vignette("grouped-analysis")` for the full nested workflow used in practice. --- ## A simple example We simulate a small dataset with a continuous outcome, a continuous predictor, and a three-level grouping factor, then compare the asymptotic and permutation p-values. ```{r sim-data} set.seed(42) n <- 30 df <- data.frame( outcome = 2.5 * rnorm(n) + c(0, 1, 2)[rep(1:3, each = 10)], predictor = rnorm(n), group = factor(rep(c("A", "B", "C"), each = 10)) ) head(df) ``` Fit a permutation GLM with 999 permutations: ```{r glm-basic} res <- grouped_perm_glm( tbl = df, formla = outcome ~ predictor + group, var_to_perm = "outcome", permNum = 999, seed = 42 ) res ``` The result is a tidy tibble with one row per model term. The key columns are: | Column | Description | |---|---| | `term` | Regression term name | | `estimate` | Fitted coefficient | | `statistic` | Observed t-statistic | | `p.value` | Asymptotic p-value from `glm` | | `p.perm` | Permutation p-value | --- ## Interpreting `p.perm` A `p.perm` of `0` does not mean the true p-value is zero — it means none of the `permNum` permuted statistics were as extreme as the observed one. Report it as: ``` p < 1/permNum (e.g. p < 0.001 for permNum = 1000) ``` The precision of `p.perm` depends on `permNum`. For exploratory work, 999 or 1 000 permutations are reasonable. For publication, 9 999 or 10 000 give better resolution, especially for small p-values. ```{r precision} # How many permutations do you need to resolve p = 0.05 with 10% relative error? # Rule of thumb: permNum >= 10 / alpha # For alpha = 0.05: permNum >= 200 (minimum); 1000+ recommended # For alpha = 0.001: permNum >= 10000 ``` --- ## The `family` argument `grouped_perm_glm()` wraps `stats::glm()`, so you can pass any family. The default is `gaussian` (ordinary least squares). For count data you might use `poisson`: ```{r poisson} set.seed(1) counts_df <- data.frame( y = rpois(20, lambda = exp(0.5 * rnorm(20))), x = rnorm(20) ) grouped_perm_glm( tbl = counts_df, formla = y ~ x, var_to_perm = "y", family = poisson, permNum = 499, seed = 1 ) ``` --- ## Binomial logistic regression For a binary outcome, use `grouped_perm_binoglm()` instead of passing `family = binomial` to `grouped_perm_glm()`. It handles the binary outcome correctly and returns z-statistics: ```{r binomial} set.seed(7) bin_df <- data.frame( outcome = rbinom(40, 1, prob = plogis(-0.5 + 0.8 * rnorm(40))), x1 = rnorm(40), x2 = rnorm(40) ) grouped_perm_binoglm( tbl = bin_df, formla = outcome ~ x1 + x2, var_to_perm = "outcome", permNum = 499, seed = 7 ) ``` --- ## Reproducibility All three functions accept a `seed` argument passed to `base::set.seed()`. Always set this in published analyses so results are exactly reproducible: ```{r reproducibility} r1 <- grouped_perm_glm(df, outcome ~ predictor + group, "outcome", permNum = 199, seed = 123) r2 <- grouped_perm_glm(df, outcome ~ predictor + group, "outcome", permNum = 199, seed = 123) identical(r1$p.perm, r2$p.perm) ``` --- ## Multiple comparisons `p.perm` values are raw — no multiple testing correction is applied automatically. When running models across many features or brain regions, apply FDR correction after collecting all results: ```{r fdr, eval = FALSE} # After collecting results from many models into a single data frame `all_res`: all_res$p.fdr <- p.adjust(all_res$p.perm, method = "fdr") ``` This matches the approach used in França et al. (2024), where 10 000 permutations and FDR correction at α = 5% were applied across brain state features in 390 neonates. --- ## Further reading - `vignette("mixed-effects-models")` — using `grouped_perm_glmm()` with `lmer` - `vignette("grouped-analysis")` — nesting models across features or states - `?grouped_perm_glm`, `?grouped_perm_glmm`, `?grouped_perm_binoglm`