--- title: "Grouped pipeline: testing across states" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Grouped pipeline: testing across states} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stateR) library(dplyr) library(tidyr) library(purrr) ``` ## Overview In practice, brain state analyses involve not just computing metrics but testing whether those metrics differ between groups or associate with continuous variables — and doing so across every state simultaneously. This vignette shows the full pipeline: 1. Compute `nest_fo()`, `nest_dwell()`, or `clusters_markov()` 2. Unnest and filter to one state (or transition) at a time 3. Apply a statistical model — demonstrated here with base R, and compatible with [`ptestR`](https://github.com/CoDe-Neuro/ptestR) for permutation tests --- ## Simulated data We simulate 20 subjects (10 term, 10 preterm), each with 80 time points and five possible states (0–4). States are drawn with unequal probabilities to create realistic differences: ```{r sim-data} set.seed(2024) n_sub <- 20 n_tp <- 80 # Preterm subjects spend more time in state 0, term in state 2 state_probs <- list( term = c(0.15, 0.20, 0.35, 0.15, 0.15), preterm = c(0.35, 0.20, 0.15, 0.15, 0.15) ) tbl <- purrr::map_dfr(seq_len(n_sub), function(i) { grp <- ifelse(i <= 10, "term", "preterm") probs <- state_probs[[grp]] tibble::tibble( subject = sprintf("sub-%02d", i), group = grp, pma = rnorm(1, mean = ifelse(grp == "term", 40, 34), sd = 2), time = seq_len(n_tp), state = sample(0:4, n_tp, replace = TRUE, prob = probs) ) }) dplyr::count(tbl, group) ``` --- ## Step 1 — Compute fractional occupancy ```{r fo} fo <- nest_fo( tbl = tbl, vars = c("subject", "group", "pma"), foVar = "state" ) fo ``` Each row is a state; the `data` list-column holds one row per subject with their `perc` value and any covariates passed via `vars`. --- ## Step 2 — Test each state Unnesting gives a flat data frame per state. We can map a model across all states using `purrr::map()`. Here we use a simple linear model (`lm`) to test whether fractional occupancy differs by group — swap in `ptestR::grouped_perm_glm()` for a permutation test: ```{r fo-test} fo_results <- fo %>% dplyr::mutate( model = purrr::map(data, ~ lm(perc ~ group + pma, data = .x)), tidy = purrr::map(model, broom::tidy) ) %>% dplyr::select(cluster, tidy) %>% tidyr::unnest(tidy) %>% dplyr::filter(term == "groupterm") %>% dplyr::select(cluster, estimate, statistic, p.value) %>% dplyr::arrange(p.value) fo_results ``` Apply FDR correction across all states: ```{r fdr} fo_results %>% dplyr::mutate(p.fdr = p.adjust(p.value, method = "fdr")) ``` --- ## Step 3 — Dwell time pipeline The same pattern works for dwell time: ```{r dwell-test} dwell <- nest_dwell( tbl = tbl, vars = c("subject", "group", "pma"), foVar = "state", sortBy = "time" ) dwell %>% dplyr::mutate( model = purrr::map(data, ~ lm(mean_dwell ~ group + pma, data = .x)), tidy = purrr::map(model, broom::tidy) ) %>% dplyr::select(cluster, tidy) %>% tidyr::unnest(tidy) %>% dplyr::filter(term == "groupterm") %>% dplyr::select(cluster, estimate, statistic, p.value) ``` --- ## Step 4 — Markov transition pipeline For transitions, the nesting key is `tag` rather than `cluster`: ```{r markov-test} trans <- clusters_markov( tbl = tbl, vars = c("subject", "group", "pma"), cVar = "state", sortBy = "time", groupBy = "subject", remIntra = FALSE ) # Unnest fully to flat frame, re-nest cleanly by tag for modelling trans_long <- tidyr::unnest(trans, data) trans_results <- trans_long %>% dplyr::select(tag, dplyr::any_of(c("subject", "group", "pma", "nCount"))) %>% dplyr::group_by(tag) %>% tidyr::nest() %>% dplyr::mutate( model = purrr::map(data, ~ lm(nCount ~ group + pma, data = .x)), tidy = purrr::map(model, broom::tidy) ) %>% dplyr::select(tag, tidy) %>% tidyr::unnest(tidy) %>% dplyr::filter(term == "groupterm") %>% dplyr::select(tag, estimate, statistic, p.value) %>% dplyr::arrange(p.value) head(trans_results, 10) ``` --- ## Using `ptestR` for permutation tests In the dHCP neonatal study, `stateR` outputs were passed directly into `ptestR` permutation tests with 10 000 permutations. The swap is one line: ```{r ptestR, eval = FALSE} library(ptestR) fo %>% dplyr::mutate( res = purrr::map(data, ~ grouped_perm_glm( tbl = .x, formla = perc ~ group + pma, var_to_perm = "perc", permNum = 10000, seed = 42 )) ) %>% dplyr::select(cluster, res) %>% tidyr::unnest(res) %>% dplyr::filter(term == "groupterm") %>% dplyr::mutate(p.fdr = p.adjust(p.perm, method = "fdr")) ``` The same pattern applies to `nest_dwell()` (replace `perc` with `mean_dwell`) and `clusters_markov()` (replace `perc` with `nCount` and `cluster` with `tag`). --- ## Notes on `vars` and covariate carriage Any column passed in `vars` is preserved in the `data` list-column and therefore available inside every `map()` call. Covariates like `pma`, `sex`, or `motion_outliers` should be included in `vars` when you plan to use them in the downstream model — they do not need any special handling. One caution: `vars` for `nest_fo()` and `nest_dwell()` defines the grouping for the metric computation. If you include a continuous covariate that varies *within* a subject (e.g. a time-varying measure), each unique combination of all `vars` values will become its own group. For subject-level covariates (one value per subject) this is not a problem. --- ## Further reading - `vignette("getting-started")` — introduction to the three metrics - `vignette("markov-transitions")` — transition matrix in depth - [`ptestR`](https://github.com/CoDe-Neuro/ptestR) — permutation tests