---
title: "Getting started with stateR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with stateR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(stateR)
library(dplyr)
library(tidyr)
```

## What is a brain state?

Dynamic functional connectivity analyses parcel resting-state fMRI time series
into a sequence of discrete **brain states** — recurring patterns of
whole-brain co-activation identified by clustering methods such as k-means or
hidden Markov models. Each volume (or window) in the scan is assigned a state
label, producing a time series like:

```
0 0 2 2 2 1 1 0 3 3 3 3 1 2 ...
```

Once you have this sequence, the natural questions are:

- **How often** is each state visited? (*fractional occupancy*)
- **How long** does each visit last? (*dwell time*)
- **How likely** is a transition from state A to state B? (*Markov transitions*)

`stateR` answers all three in a tidy, pipeable workflow.

---

## Input format

All three functions expect a **long-format tibble** with one row per subject
per time point, plus any grouping or covariate columns you want to carry
through to the output:

| Column | Role |
|--------|------|
| Subject / session ID | Grouping — passed via `vars` |
| Time index | Ordering — passed via `sortBy` |
| State label | The state sequence — passed via `foVar` or `cVar` |
| Any covariates | Carried through unchanged |

---

## Simulated data

We simulate five subjects, each with 40 time points and four possible states
(0–3):

```{r sim-data}
set.seed(42)

n_subjects  <- 5
n_timepoints <- 40

tbl <- tibble::tibble(
  subject = rep(paste0("sub-0", 1:n_subjects), each = n_timepoints),
  group   = rep(c("term", "preterm"), times = c(3 * n_timepoints,
                                                 2 * n_timepoints)),
  time    = rep(seq_len(n_timepoints), n_subjects),
  state   = sample(0:3, n_subjects * n_timepoints, replace = TRUE)
)

head(tbl, 8)
```

---

## Fractional occupancy with `nest_fo()`

`nest_fo()` computes the proportion of time points each subject/group spends
in each state:

```{r nest-fo}
fo <- nest_fo(
  tbl   = tbl,
  vars  = c("subject", "group"),
  foVar = "state"
)

fo
```

The result is a **state-nested tibble** — one row per state, with a `data`
list-column holding each subject's fractional occupancy (`perc`):

```{r fo-unnest}
fo %>%
  tidyr::unnest(data) %>%
  head(12)
```

To work with a specific state:

```{r fo-filter}
fo %>%
  tidyr::unnest(data) %>%
  dplyr::filter(cluster == "2")
```

---

## Dwell time with `nest_dwell()`

`nest_dwell()` computes the **mean continuous occupancy** per state — the
average number of consecutive time points spent in a single uninterrupted visit.
Single time-point visits (dwell = 1) are excluded, as they likely reflect noise
rather than genuine state occupation.

```{r nest-dwell}
dwell <- nest_dwell(
  tbl    = tbl,
  vars   = c("subject", "group"),
  foVar  = "state",
  sortBy = "time"
)

dwell %>%
  tidyr::unnest(data)
```

The `sortBy` argument is critical — it ensures observations are ordered
chronologically before the run-length encoding that underlies dwell time
estimation. Always pass the time index column here.

---

## Markov transitions with `clusters_markov()`

`clusters_markov()` computes **transition probabilities** between states. For
each source state, it counts every observed transition and normalises by the
total number of transitions out of that source — a first-order Markov chain.

```{r clusters-markov}
trans <- clusters_markov(
  tbl      = tbl,
  vars     = c("subject", "group"),
  cVar     = "state",
  sortBy   = "time",
  groupBy  = "subject",
  remIntra = FALSE
)

trans
```

Transitions are labelled by a `tag` in `"source_target"` format. To inspect a
specific transition:

```{r markov-filter}
trans %>%
  tidyr::unnest(data) %>%
  dplyr::filter(tag == "0_2")
```

Set `remIntra = TRUE` to exclude self-transitions (e.g. state 2 → state 2),
which is useful when you are interested only in genuine state changes:

```{r markov-no-intra}
clusters_markov(
  tbl      = tbl,
  vars     = c("subject", "group"),
  cVar     = "state",
  sortBy   = "time",
  groupBy  = "subject",
  remIntra = TRUE
) %>%
  tidyr::unnest(data) %>%
  dplyr::filter(tag == "0_2")
```

---

## Output structure at a glance

All three functions return the same **state-nested tibble** shape:

| Function | Nesting key | Key output column | Unit |
|----------|-------------|-------------------|------|
| `nest_fo()` | `cluster` | `perc` | Proportion (0–1) |
| `nest_dwell()` | `cluster` | `mean_dwell` | Time points |
| `clusters_markov()` | `tag` (e.g. `"0_2"`) | `nCount` | Probability (0–1) |

This shared shape makes it straightforward to apply the same downstream
analysis (e.g. permutation tests with
[`ptestR`](https://github.com/CoDe-Neuro/ptestR)) across all three metrics
without changing your pipeline.

---

## Further reading

- `vignette("markov-transitions")` — a deeper look at the transition matrix
- `vignette("grouped-pipeline")` — running statistical tests across all states
- `?nest_fo`, `?nest_dwell`, `?clusters_markov`