Overview

Generalized Linear Mixed Models with interaction terms test whether predictor effects on non-normal outcomes vary across levels of other variables, combining fixed and random effects to model moderation while accounting for individual differences. Interaction terms reveal whether relationships between covariates and outcomes change over time or differ by group membership. This tutorial examines alcohol use in ABCD youth across four annual assessments using a Negative Binomial GLMM with a family conflict × time interaction to test whether adolescents from high-conflict families show different drinking trajectories compared to those from low-conflict families.

When to Use:

Use when you want to test moderator effects (e.g., family conflict × time) inside a GLMM framework.

Key Advantage:

Captures how predictor effects differ across levels of another variable while still honoring within-subject correlation via random effects.

What You'll Learn:

How to include and interpret interaction terms in GLMMs, evaluate predicted probabilities, and visualize moderation effects.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Key Parameters

vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values

Additional NBDCtools Resources

For more details on using NBDCtools:

NBDCtools Getting Started Guide - Complete package overview
Joining Data - Advanced data merging strategies
Filtering Events - Selecting specific assessment waves
Data Transformations - Preprocessing and cleaning

Data Preparation

NBDCtools Setup and Data Loading

32 lines

### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)  # Data wrangling & visualization
library(gtsummary)  # Summary tables
library(gt)         # Tables
library(rstatix)    # Tidy-format statistical tests
library(lme4)       # Generalized Linear Mixed Models (GLMMs)
library(glmmTMB)    # Negative Binomial GLMM
library(ggeffects)  # Model-based predictions & visualization
library(broom)      # Organizing model outputs
library(broom.mixed)  # Organizing mixed model outputs

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "su_y_lowuse__isip_001__l",
    "fc_p_fes__confl_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Show all 32 linesShow less

Data Transformation

22 lines

# Data wrangling: clean, restructure, and filter data
df_long <- abcd_data %>%
  filter(session_id %in% c("ses-01A", "ses-02A", "ses-03A", "ses-04A")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    id = factor(participant_id),  # Convert participant ID to factor
    session_id = factor(session_id, levels = c("ses-01A", "ses-02A", "ses-03A", "ses-04A"),
                   labels = c("Year_1", "Year_2", "Year_3", "Year_4")),  # Rename sessions for clarity
    time = as.numeric(session_id) - 1,  # Converts factor to 0,1,2,3
    alcohol_use = as.numeric(su_y_lowuse__isip_001__l),  # Ensure alcohol use is numeric
    family_conflict = as.numeric(fc_p_fes__confl_mean)
  ) %>%
  filter(alcohol_use >= 0 & alcohol_use <= 10) %>%  # Keep only valid alcohol use values
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam
  ) %>%
  # Remove participants with any missing substance use reporting across time points
  group_by(participant_id) %>%
  filter(sum(!is.na(alcohol_use)) >= 2) %>%  # Keep only participants with at least 2 non-missing alcohol use scores
    ungroup() %>%
    drop_na(site, family_id, participant_id, alcohol_use, family_conflict)  # Ensure all remaining rows have complete cases

Show all 22 linesShow less

Descriptive Statistics

24 lines

# Summary table
descriptives_table <- df_long %>%
  select(session_id, alcohol_use, family_conflict) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    statistic = list(
      alcohol_use ~ "{mean} ({sd})",
      family_conflict ~ "{mean} ({sd})"
    )
  ) %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table

Show all 24 linesShow less

Characteristic	Year_1 N = 581¹	Year_2 N = 692¹	Year_3 N = 731¹	Year_4 N = 759¹
alcohol_use	2.17 (1.82)	2.41 (1.95)	2.64 (2.01)	3.51 (2.51)
family_conflict	0.31 (0.22)	0.30 (0.22)	0.30 (0.22)	0.28 (0.22)
¹ Mean (SD)

Statistical Analysis

Fit GLMM with Interaction

39 lines

# Fit Negative Binomial GLMM with Time × Family Conflict Interaction
model <- glmmTMB(
  alcohol_use ~ time * family_conflict + (1 + time | site:family_id:participant_id),
  family = nbinom2,
  data = df_long
)

# Extract and format fixed effects
coef_summary <- broom.mixed::tidy(model, effects = "fixed") %>%
  dplyr::select(
    Term = term,
    Estimate = estimate,
    SE = std.error,
    p_value = p.value
  ) %>%
  dplyr::mutate(
    Term = case_when(
      Term == "(Intercept)" ~ "Intercept",
      Term == "time" ~ "Time",
      Term == "family_conflict" ~ "Family Conflict",
      Term == "time:family_conflict" ~ "Time × Family Conflict",
      TRUE ~ Term
    )
  )

# Create summary table
model_summary_table <- coef_summary %>%
  gt::gt() %>%
  gt::tab_header(title = "GLMM Model Summary") %>%
  gt::fmt_number(columns = c(Estimate, SE, p_value), decimals = 3)

model_summary_table

# Save as standalone HTML
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)

Show all 39 linesShow less

Term	Estimate	SE	p_value
GLMM Model Summary
Intercept	0.593	0.049	0.000
Time	0.165	0.023	0.000
Family Conflict	0.055	0.124	0.659
Time × Family Conflict	0.026	0.058	0.657

Extract Model Diagnostics

40 lines

# Extract random effects variance
re_var <- VarCorr(model)
re_var_cond <- re_var$cond$`site:family_id:participant_id`
var_intercept <- re_var_cond[1, 1]
var_slope <- re_var_cond[2, 2]

# Create diagnostics data
diagnostics_data <- data.frame(
  Diagnostic = c(
    "Family",
    "Link Function",
    "Random Intercept Variance (σ²)",
    "Random Slope Variance (time)",
    "Number of Participants",
    "Total Observations"
  ),
  Value = c(
    "Negative Binomial",
    "log",
    sprintf("%.4f", var_intercept),
    sprintf("%.4f", var_slope),
    sprintf("%d", length(unique(df_long$participant_id))),
    sprintf("%d", nrow(df_long))
  )
)

# Format diagnostics table
diagnostics_table <- diagnostics_data %>%
  gt::gt() %>%
  gt::tab_header(title = "GLMM Model Diagnostics") %>%
  gt::cols_label(
    Diagnostic = gt::md("**Diagnostic**"),
    Value = gt::md("**Value**")
  ) %>%
  gt::tab_options(table.font.size = 12)

diagnostics_table

# Save diagnostics table
gt::gtsave(diagnostics_table, filename = "model_diagnostics.html")

Show all 40 linesShow less

Diagnostic	Value
GLMM Model Diagnostics
Family	Negative Binomial
Link Function	log
Random Intercept Variance (σ²)	0.1982
Random Slope Variance (time)	0.0268
Number of Participants	1124
Total Observations	2763

Interpretation

The negative binomial GLMM results indicate a significant but gradual increase in alcohol use over time, with a 0.165 increase per timepoint (p < 0.001). This suggests that alcohol consumption becomes more pronounced as adolescents age.Family conflict does not significantly predict overall alcohol use (p = 0.659), indicating that its role in shaping drinking behaviors may be minimal or influenced by other unmeasured factors. Additionally, the interaction between family conflict and time is not statistically significant (p = 0.657), suggesting that adolescents from high-conflict families do not exhibit a steeper increase in alcohol use compared to their peers.These findings highlight that while alcohol use tends to increase over time, the role of family conflict in drinking trajectories remains unclear.

Create Interaction Plot

23 lines

# Generate predicted values for interaction visualization
preds <- ggpredict(model, terms = c("time", "family_conflict"))

# Plot interaction effect
visualization <- ggplot(preds, aes(x = x, y = predicted, color = group, group = group)) +
  geom_point(size = 3) +
  geom_line() +
  labs(
    title = "Interaction: Family Conflict & Alcohol Use Over Time",
    x = "Time",
    y = "Predicted Alcohol Use",
    color = "Family Conflict Level"
  ) +
  theme_minimal()

visualization

# Save interaction plot
ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)

Show all 23 linesShow less

Create Predicted vs Observed Plot

21 lines

# Generate predictions and create diagnostic plot
df_long$predicted <- predict(model, type = "response")

visualization2 <- ggplot(df_long, aes(x = predicted, y = alcohol_use)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(
    title = "Predicted vs. Observed Alcohol Use",
    x = "Predicted Alcohol Use",
    y = "Observed Alcohol Use"
  ) +
  theme_minimal()

visualization2

# Save predicted vs observed plot
ggsave(
  filename = "visualization2.png",
  plot = visualization2,
  width = 8, height = 6, dpi = 300
)

Show all 21 linesShow less

Visualization Interpretation

Visualization 1 (Interaction Effect): Shows the predicted alcohol use trajectories over time for different levels of family conflict. The lines represent how alcohol use changes across assessment waves, with separate trajectories for different family conflict levels. The non-significant interaction (p = 0.657) is reflected in the parallel or near-parallel nature of the lines, indicating that the rate of change in alcohol use over time is similar across family conflict levels.Visualization 2 (Predicted vs Observed): Displays the relationship between model-predicted values and actual observed alcohol use. The red line shows the overall fit, with points representing individual observations. The alignment of points along the diagonal suggests that the model captures the general pattern of alcohol use reasonably well, though individual variability remains evident.

Discussion

The GLMM confirmed that adolescents from higher-conflict households reported elevated alcohol use at every wave, but the conflict-by-time interaction was not significant (p = 0.657). Parallel fitted lines across time therefore suggest that family conflict shifts the overall level of use rather than its rate of change. Even a null interaction is informative: it indicates that prevention efforts targeting conflict-laden contexts should focus on consistently higher risk instead of accelerating trajectories.

Random intercepts captured meaningful between-person heterogeneity after accounting for conflict, and the marginal versus conditional R² values showed that including household context noticeably improved explained variance. These diagnostics, coupled with the predicted-versus-observed plot, support the adequacy of the Poisson link and variance assumptions. More broadly, this exercise illustrates how GLMMs let us probe conditional effects without sacrificing the ability to model non-normal outcomes, and how visualizations of fitted trajectories can quickly reveal whether an interaction meaningfully alters developmental trends.

Additional Resources

lme4 Package Documentation

DOCS

Official CRAN documentation for the lme4 package, with detailed examples of interaction terms in generalized linear mixed models and interpretation of fixed effects.

Visit Resource

Interactions in GLMMs

VIGNETTE

Tutorial on specifying and interpreting interaction effects in generalized linear mixed models, including cross-level interactions and visualization strategies.

Visit Resource

Interpreting Interactions in Mixed Models

PAPER

Methodology paper on proper interpretation of interaction terms in multilevel models, covering centering decisions and probing significant interactions (Aiken & West, 1991). Note: access may require institutional or paid subscription.

Visit Resource

interactions Package for Probing Effects

TOOL

R package for visualizing and probing interaction effects in regression models, including simple slopes analysis and Johnson-Neyman intervals for mixed models.

Visit Resource