---
title: "Grouped analysis across features"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Grouped analysis across features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(ptestR)
library(dplyr)
library(tidyr)
library(purrr)
```

## The real use case

In neuroimaging and multivariate biomedical research, it is common to run the
same model across many features simultaneously — brain regions, EEG channels,
metabolites, or questionnaire subscales. ptestR is designed for exactly this
pattern via the `nest()` + `map()` + `unnest()` idiom from the tidyverse.

---

## Simulated dataset

We simulate a dataset with 15 subjects, each measured at 3 time points, across
2 groups and 4 features. This gives 45 observations per feature — enough for
`lmer` to fit random intercepts per subject comfortably.

```{r simulate}
set.seed(42)

n_subj    <- 15
n_time    <- 3
features  <- c("feature_A", "feature_B", "feature_C", "feature_D")

# Subject-level covariates
subjects <- data.frame(
  subject = seq_len(n_subj),
  group   = rep(c("control", "case"), length.out = n_subj),
  age     = round(rnorm(n_subj, mean = 35, sd = 8), 1)
)

# Expand to subject × time × feature, then add values
df <- expand.grid(
  subject = seq_len(n_subj),
  time    = seq_len(n_time),
  feature = features,
  stringsAsFactors = FALSE
) |>
  left_join(subjects, by = "subject") |>
  mutate(
    # Random intercept per subject (shared across features)
    rand_int = rep(rnorm(n_subj, sd = 1), times = n_time * length(features)),
    value = case_when(
      feature == "feature_A" ~  0.6 * (group == "case") + 0.2 * time + rand_int + rnorm(n()),
      feature == "feature_B" ~  0.1 * (group == "case") + 0.1 * time + rand_int + rnorm(n()),
      feature == "feature_C" ~ -0.5 * (group == "case") + 0.3 * time + rand_int + rnorm(n()),
      feature == "feature_D" ~  0.0 * (group == "case") + 0.0 * time + rand_int + rnorm(n())
    )
  ) |>
  select(subject, time, group, age, feature, value)

glimpse(df)
```

Each nested sub-dataset (one per feature) will have `n_subj × n_time = 45`
rows with 15 distinct subjects — well within `lmer`'s requirements.

---

## Running across features with `grouped_perm_glmm()`

The pattern is: `group_by()` the feature → `nest()` → `map()` the permutation
function → `unnest()` the results.

```{r grouped-glmm}
results <- df |>
  group_by(feature) |>
  nest() |>
  mutate(
    perm = map(data, \(d) grouped_perm_glmm(
      tbl         = d,
      formla      = value ~ group + age + time + (1 | subject),
      var_to_perm = "value",
      permNum     = 499,
      seed        = 42
    ))
  ) |>
  unnest(perm) |>
  select(-data)

results
```

---

## Filtering to a term of interest

Filter to the `group` effect across all features:

```{r filter-group}
group_results <- results |>
  filter(term == "groupcontrol") |>
  arrange(p.perm)

group_results |>
  select(feature, estimate, statistic, p.perm)
```

Feature A and C should show the strongest effects (we simulated them that way),
while feature D should be near noise.

---

## FDR correction

With multiple features, apply FDR correction across all models for each term:

```{r fdr}
group_results <- group_results |>
  mutate(p.fdr = p.adjust(p.perm, method = "fdr"))

group_results |>
  select(feature, estimate, statistic, p.perm, p.fdr)
```

---

## Handling `p.perm = 0`

When no permuted statistic is as extreme as the observed one, `p.perm = 0`.
This is not a true zero — it means the effect is stronger than all `permNum`
permuted null statistics. Report as `p < 1/permNum`:

```{r zero-perm}
group_results |>
  mutate(
    p.label = ifelse(p.perm == 0,
                     paste0("p < ", round(1 / 499, 4)),
                     paste0("p = ", round(p.perm, 3)))
  ) |>
  select(feature, p.label, p.fdr)
```

With `permNum = 9999` (recommended for publication), the bound becomes
`p < 0.0001`.

---

## Plotting the results

A simple dot plot showing effect size and significance:

```{r dotplot, fig.width = 5.5, fig.height = 3.5}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)

  group_results |>
    mutate(sig = p.fdr < 0.05) |>
    ggplot(aes(x = estimate, y = feature, colour = sig)) +
    geom_vline(xintercept = 0, linetype = "dashed", colour = "#DCC3AA") +
    geom_point(size = 4) +
    scale_colour_manual(
      values = c("TRUE" = "#810B38", "FALSE" = "#DCC3AA"),
      labels = c("TRUE" = "FDR < 5%", "FALSE" = "ns"),
      name   = NULL
    ) +
    labs(
      title = "Group effect by feature",
      x     = "Coefficient estimate",
      y     = NULL
    ) +
    theme_minimal(base_size = 12)
}
```

---

## Using `grouped_perm_glm()` instead

If your data have no repeated measures, swap to `grouped_perm_glm()` and drop
the random-effects term:

```{r grouped-glm}
# Summarise to one observation per subject × feature (e.g. mean across time)
df_subj <- df |>
  group_by(subject, group, age, feature) |>
  summarise(value = mean(value), .groups = "drop")

glm_results <- df_subj |>
  group_by(feature) |>
  nest() |>
  mutate(
    perm = map(data, \(d) grouped_perm_glm(
      tbl         = d,
      formla      = value ~ group + age,
      var_to_perm = "value",
      permNum     = 499,
      seed        = 42
    ))
  ) |>
  unnest(perm) |>
  select(-data)

glm_results |>
  filter(term == "groupcontrol") |>
  select(feature, estimate, statistic, p.perm) |>
  arrange(p.perm)
```

---

## Tips for large-scale analyses

**Use more permutations for publication.** 499 is used here for speed; use
9 999 or more for final results.

**Parallelise with `furrr`.** Each `map()` call is independent — swap
`purrr::map()` for `furrr::future_map()` for free parallelisation:

```{r parallel, eval = FALSE}
library(furrr)
plan(multisession, workers = 4)

results <- df |>
  group_by(feature) |>
  nest() |>
  mutate(
    perm = future_map(data, \(d) grouped_perm_glmm(
      d,
      formla      = value ~ group + age + time + (1 | subject),
      var_to_perm = "value",
      permNum     = 9999,
      seed        = 42
    ), .options = furrr_options(seed = TRUE))
  ) |>
  unnest(perm) |>
  select(-data)
```

---

## Further reading

- `vignette("getting-started")` — core concepts and `grouped_perm_glm()`
- `vignette("mixed-effects-models")` — `grouped_perm_glmm()` in depth
- `?grouped_perm_glmm`, `?grouped_perm_glm`