Longitudinal.dev

Linear Mixed Models (LMM)

Time-Invariant Covariates (LMM)

Incorporate time-invariant covariates within mixed models to test whether baseline characteristics explain differences in intercepts or slopes across ABCD participants.

LMM โ€บ Continuous
lme4Updated 2025-11-05abcd-studylinear-mixed-modeltime-invariant-covariate
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโ€”use the "Suggest changes" button to help us improve!

Overview

Linear mixed models with random intercepts and slopes extended with time-invariant covariates allow examination of how stable individual characteristics predict both baseline levels and rates of change over time. By modeling covariate effects on latent intercept and slope parameters, this approach reveals whether factors like demographics or socioeconomic status influence starting points, developmental trajectories, or both. This tutorial analyzes cognition scores from ABCD youth across four annual assessments, incorporating parental education as a time-invariant covariate to test whether it predicts baseline cognition levels and rates of cognitive change.

When to Use:
Use when you suspect static characteristics (e.g., parental education, sex) explain differences in both initial levels and longitudinal change in ABCD outcomes.
Key Advantage:
Simultaneously estimates the population trajectory and covariate effects, making it easy to see how subject-level factors shift intercepts and slopes.
What You'll Learn:
How to add time-invariant covariates to lme4 models, interpret their effects on intercepts/slopes, and visualize stratified trajectories.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional NBDCtools Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R32 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(arrow)       # For reading Parquet files
library(tidyverse)   # For data manipulation & visualization
library(gtsummary)   # For generating publication-quality summary tables
library(rstatix)     # Provides tidy-format statistical tests
library(lme4)        # Linear mixed-effects models (LMMs)
library(sjPlot)      # Visualization of mixed models
library(kableExtra)  # Formatting & styling in HTML/Markdown reports
library(performance) # Useful functions for model diagnostics & comparisons
library(ggeffects)   # Adjusted regression predictions

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "ab_g_dyn__cohort_edu__cgs",  # Parent education
    "nc_y_nihtb__comp__cryst__fullcorr_tscore"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Filter and Process Time-Invariant Covariates
R14 lines
# Filter sessions and forward-fill time-invariant covariates
df_long <- abcd_data %>%
  filter(session_id %in% c("ses-00A", "ses-02A", "ses-04A", "ses-06A")) %>%
  group_by(participant_id) %>%
  fill(ab_g_dyn__cohort_edu__cgs, .direction = "down") %>%
  ungroup() %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(session_id,
      levels = c("ses-00A", "ses-02A", "ses-04A", "ses-06A"),
      labels = c("Baseline", "Year_2", "Year_4", "Year_6")),
    time = as.numeric(session_id) - 1,
    parent_education = as.numeric(ab_g_dyn__cohort_edu__cgs)
  )
Categorize Education and Finalize Dataset
R22 lines
# Categorize parent education and prepare final dataset
df_long <- df_long %>%
  mutate(
    parent_education_cat = factor(case_when(
      parent_education <= 12 ~ "Less than HS",
      parent_education %in% 13:14 ~ "HS Diploma/GED",
      parent_education %in% 15:17 ~ "Some College/Associate",
      parent_education == 18 ~ "Bachelor's Degree",
      parent_education %in% 19:21 ~ "Graduate Degree",
      TRUE ~ NA_character_
    ), levels = c("Less than HS", "HS Diploma/GED",
                  "Some College/Associate", "Bachelor's Degree", "Graduate Degree"))
  ) %>%
  rename(
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    cognition = nc_y_nihtb__comp__cryst__fullcorr_tscore
  ) %>%
  group_by(participant_id) %>%
  filter(sum(!is.na(cognition)) >= 2) %>%
  ungroup() %>%
  drop_na(site, family_id, participant_id, cognition)
Descriptive Statistics
R27 lines
# Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, parent_education_cat, cognition) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      parent_education_cat ~ "Parent Education",
      cognition ~ "Cognition"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "**{level}**<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "**Assessment Wave**") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Baseline
N = 8310
1
Year_2
N = 7046
1
Year_4
N = 3244
1
Year_6
N = 2656
1
Parent Education



ย ย ย ย Less than HS 8,300 (100%) 7,046 (100%) 3,244 (100%) 2,656 (100%)
ย ย ย ย HS Diploma/GED 0 (0%) 0 (0%) 0 (0%) 0 (0%)
ย ย ย ย Some College/Associate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
ย ย ย ย Bachelor's Degree 0 (0%) 0 (0%) 0 (0%) 0 (0%)
ย ย ย ย Graduate Degree 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Cognition 51 (11) 49 (11) 50 (11) 51 (10)
1 n (%); Mean (SD)

Statistical Analysis

Fit LMM and Generate Summary Tables
R33 lines
# Fit a Linear Mixed Model with random intercepts and slopes
model <- lmerTest::lmer(
  cognition ~ time + parent_education + (1 + time | site:family_id:participant_id),
  data = df_long
)

# Generate summary table for the LMM model
model_summary <- gtsummary::tbl_regression(model,
  digits = 3,
  intercept = TRUE
) %>%
  gtsummary::as_gt()

# Display model summary
model_summary

# Save the gt table
gt::gtsave(
  data = model_summary,
  filename = "model_summary_table.html",
  inline_css = FALSE
)

# Generate comprehensive diagnostics with sjPlot
sjPlot::tab_model(model,
  show.se = TRUE, show.df = FALSE, show.ci = FALSE,
  digits = 3,
  pred.labels = c("Intercept", "Time", "Parent Education"),
  dv.labels = c("LMM Model (lme4)"),
  string.se = "SE",
  string.p = "P-Value",
  file = "lmm_model_results.html"
)
Test Education ร— Time Interaction
R32 lines
# Fit interaction model
model_interaction <- lmer(
  cognition ~ time * parent_education + (1 + time | participant_id),
  data = df_long
)

# Compare models with ANOVA
anova_output <- anova(model, model_interaction)

# Convert to gt table
anova_gt_table <- gt::gt(
  data = anova_output,
  rownames_to_stub = TRUE
) %>%
  gt::fmt_number(
    columns = everything(),
    decimals = 3
  ) %>%
  gt::fmt_scientific(
    columns = tidyselect::all_of(c("Chisq", "Pr(>Chisq)")),
    decimals = 4
  ) %>%
  gt::tab_header(
    title = "ANOVA Model Comparison"
  )

# Save the gt table as HTML
gt::gtsave(
  data = anova_gt_table,
  filename = "anova_model_comparison.html",
  inline_css = FALSE
)
Characteristic Beta 95% CI p-value
(Intercept) 46 45, 47 <0.001
time -0.35 -0.44, -0.26 <0.001
parent_education 1.2 1.0, 1.3 <0.001
Abbreviation: CI = Confidence Interval
  LMM Model (lme4)
Predictors Estimates SE P-Value
Intercept 46.123 0.323 <0.001
Time -0.347 0.047 <0.001
Parent Education 1.172 0.079 <0.001
Random Effects
σ2 34.58
τ00 site:family_id:participant_id 87.27
τ11 site:family_id:participant_id.time 2.30
ρ01 site:family_id:participant_id -0.31
ICC 0.71
N site 22
N family_id 7222
N participant_id 8666
Observations 21246
Marginal R2 / Conditional R2 0.016 / 0.711
ANOVA Model Comparison
npar AIC BIC logLik -2*log(L) Chisq Df Pr(>Chisq)
model 7.000 153,182.242 153,237.990 โˆ’76,584.121 153,168.242 NA NA NA
model_interaction 8.000 153,083.151 153,146.862 โˆ’76,533.575 153,067.151 1.0109 ร— 102 1.000 8.7827 ร— 10โˆ’24
Interpretation
Interpretation

The fixed effects estimates indicate that the average cognition score at baseline (time = 0) is 46.1 (SE = 0.32, p < 0.001). Over time, cognition declines by approximately 0.35 points per biannual assessment (SE = 0.05, p < 0.001), suggesting a gradual decline in cognitive performance. Additionally, higher parental education levels are associated with significantly higher cognition scores (ฮฒ = 1.17, SE = 0.08, p < 0.001), indicating a positive relationship between parental education and cognitive performance.The random effects (see Model Summary Output-2) reveal substantial variability in both baseline cognition scores (ฯ„โ‚€โ‚€ = 87.27) and individual rates of cognitive decline (ฯ„โ‚โ‚ = 2.30). The negative correlation (ฯโ‚€โ‚ = -0.31) between the intercept and slope suggests that individuals with higher initial cognition scores tend to experience steeper declines over time. The intraclass correlation (ICC = 0.71) indicates that 71% of the total variance in cognition scores is attributable to between-individual differences rather than within-person fluctuations over time.

Visualization
R22 lines
# Get predicted neurocognition scores by parental education
preds_edu <- ggpredict(model, terms = "parent_education")  # Correct function

# Plot
visualization <- ggplot(preds_edu, aes(x = x, y = predicted)) +
  geom_point(size = 3, color = "darkred") +
  geom_line(group = 1, color = "darkred") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) +
  labs(title = "Effect of Parental Education on Neurocognition",
       x = "Parental Education Level",
       y = "Predicted Neurocognition Score") +
  theme_minimal()

# Display the plot
visualization

# Save the plot
ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Plot
Interpretation
Visualization Notes

The plot illustrates the effect of parental education on predicted neurocognition scores. Each point represents the model-estimated neurocognition score for a given level of parental education, with error bars showing the 95% confidence intervals. The visualization demonstrates a clear positive relationship: as parental education increases, predicted neurocognition scores also increase, controlling for time and individual variability.This pattern confirms the significant positive effect of parental education (ฮฒ = 1.17, p < 0.001) observed in the fixed effects, showing that higher parental education levels are associated with better cognitive performance across all time points.

Discussion

Key Findings

The longitudinal analysis revealed significant insights into the influence of parental education on cognitive performance. Participants with higher parental education demonstrated significantly higher baseline cognition scores (ฮฒ = 1.17, p < 0.001), indicating that parental education is a strong predictor of initial cognitive ability.

Implications

The ANOVA model comparison (Model Summary Output-3) tested whether parental education also moderates the rate of cognitive change by comparing the main effects model to an interaction model (time ร— parent_education). The significant chi-square test (ฯ‡ยฒ = 101.09, p < 0.001) indicates that the interaction model provides a significantly better fit, suggesting that the effect of parental education on cognition may vary across time. This finding warrants further investigation into how parental education influences not only baseline cognitive performance but also developmental trajectories.

By including parental education as a time-invariant covariate in the linear mixed model, we were better able to capture and account for individual differences in cognitive development, demonstrating that stable background characteristics significantly influence longitudinal cognitive outcomes.

Additional Resources

4

lme4 Package Documentation

DOCS

Official CRAN documentation for the lme4 package, with examples of incorporating time-invariant covariates into linear mixed models. Covers model specification with covariates predicting both intercepts and slopes, including interaction terms with time.

Visit Resource

Fitting Linear Mixed-Effects Models Using lme4

VIGNETTE

Comprehensive lme4 vignette with Section 2.4 covering conditional models with time-invariant predictors. Demonstrates centering strategies for covariates, interpretation of cross-level interactions, and testing whether covariates predict individual differences in change rates.

Visit Resource

Longitudinal Data Analysis

BOOK

Singer & Willett (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Chapters 5-6 provide detailed coverage of time-invariant covariates in growth models, including centering decisions, interpretation of covariate-by-time interactions, and distinguishing between effects on initial status versus change. Note: access may require institutional or paid subscription.

Visit Resource

Centering Predictor Variables in Mixed Models

PAPER

Enders & Tofighi (2007). Methodological paper addressing the critical question of when and how to center time-invariant covariates in multilevel models. Explains grand-mean versus group-mean centering and how centering choices affect the interpretation of fixed and random effects. Note: access may require institutional or paid subscription.

Visit Resource