Build generalized linear mixed models for clustered count data, specify random effects, handle overdispersion, and interpret conditional estimates for ABCD longitudinal outcomes.
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedback—use the "Suggest changes" button to help us improve!
Overview
Generalized Linear Mixed Models (GLMMs) extend linear mixed models to handle non-normally distributed outcomes such as counts or binary responses while modeling random effects to account for individual differences and hierarchical structures. By combining generalized linear model distributions with random intercepts and slopes, GLMMs capture both population-level trends and person-specific variability in longitudinal data. This tutorial examines alcohol use in ABCD youth across four annual assessments using a Poisson GLMM to model drinking frequency, estimating fixed effects for population trends and random effects for individual variability.
When to Use:
Ideal when you need subject-specific inference for non-Gaussian outcomes collected repeatedly in ABCD.
Key Advantage:
GLMMs combine fixed effects with random intercepts/slopes, delivering both population and subject-level insight for generalized outcomes.
What You'll Learn:
How to fit GLMMs in , interpret fixed/random effects, and evaluate fit/diagnostics for binary/count data.
Data Access
Data Download
ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.
Loading Data with NBDCtools
Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:
Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves
# Fit a Poisson GLMM with random intercepts for site, family, and participant
# The random effects use fully nested structure (site:family:participant)
# This accounts for hierarchical clustering in the ABCD design
model <- glmer(
alcohol_use ~ time + (1 | site:family_id:participant_id),
data = df_long,
family = poisson(link = "log"),
control = glmerControl(optimizer = "bobyqa")
)
# Generate a summary table for the GLMM model
model_summary_table <- gtsummary::tbl_regression(model,
digits = 3,
intercept = TRUE
) %>%
gtsummary::as_gt()
# Display model summary (optional)
model_summary_table
### Save the gt table
gt::gtsave(
data = model_summary_table,
filename = "model_summary.html",
inline_css = FALSE
)
Characteristic
log(IRR)
95% CI
p-value
(Intercept)
0.62
0.57, 0.67
<0.001
time
0.17
0.15, 0.20
<0.001
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio
The Poisson GLMM results indicate a significant increase in alcohol use over time, with the time coefficient of 0.17 (log-scale, 95% CI: 0.15, 0.20, p < 0.001) suggesting an upward trend in consumption across assessments. This corresponds to an incidence rate ratio (IRR) of approximately 1.19 (exp(0.17) ≈ 1.19), meaning alcohol use increases by approximately 19% per assessment wave.The random intercept variance (σ² = 0.1868) highlights moderate individual differences in baseline alcohol use, reinforcing the importance of accounting for between-person variability.Model fit metrics, including the log-likelihood value, suggest that the GLMM provides a well-suited framework for capturing both population-wide trends and subject-specific differences in alcohol consumption over time.
Visualize
20 lines
# Generate model predictions for visualization
df_long$predicted <- predict(model, type = "response")
visualization <- ggplot(df_long, aes(x = predicted, y = alcohol_use)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(title = "Predicted vs. Observed Alcohol Use",
x = "Predicted Alcohol Use",
y = "Observed Alcohol Use") +
theme_minimal()
# Display the plot
visualization
# Save the plot
ggsave(
filename = "visualization.png",
plot = visualization,
width = 8, height = 6, dpi = 300
)
Interpretation
Visualization Notes
The predicted trajectories indicate a steady increase in alcohol use across assessment waves, reflecting an overall upward trend in consumption over time. While the population-level pattern suggests consistent growth, individual trajectories vary, highlighting differences in baseline alcohol use and rates of change. This variability underscores the importance of accounting for both fixed effects (overall trends) and random effects (subject-specific deviations) in modeling alcohol use trajectories.
Discussion
The Poisson GLMM indicated a clear upward shift in alcohol use: the fixed effect for time was 0.17 on the log scale (p < .001), which translates to an incidence-rate ratio of roughly 1.19 per wave. In practical terms, self-reported drinking frequency increased about 19% each assessment, even after adjusting for repeated measures. Visualizations of the fitted trajectories mirrored this monotonic rise.
Random intercept variance (σ² = 0.187) remained sizable, indicating that youth entered the study with very different baseline propensities that persisted after conditioning on time. Inspecting predicted versus observed counts showed no systemic bias, suggesting the Poisson mean-variance assumption was adequate for these data. Together, the fixed and random effects illustrate how GLMMs can capture both the population trend and the heterogeneity around it, offering a richer story than either a simple Poisson regression or subject-specific regressions could provide.
Additional Resources
4
lme4 Package Documentation
DOCS
Official CRAN documentation for the lme4 package, covering the glmer() function for fitting generalized linear mixed models with detailed specifications for family distributions and link functions.
Comprehensive vignette on implementing generalized linear mixed models using lme4, including binary, count, and proportion outcomes with random effects specifications.
Data Analysis Using Regression and Multilevel Models
BOOK
Foundational textbook by Gelman & Hill covering hierarchical models for non-normal outcomes. Chapters 13-14 focus on GLMMs with practical examples and interpretation guidance.
R package for creating publication-quality tables and plots from mixed models, including predicted probabilities, marginal effects, and random effects visualization.