Longitudinal.dev

Latent Growth Curve Models (LGCM)

Time-Invariant Covariates (LGCM)

Add time-invariant covariates to latent growth curve models to evaluate how stable characteristics shift baseline levels and slopes of ABCD emotional suppression.

LGCM โ€บ Continuous
lavaanUpdated 2025-11-05abcd-studylatent-growth-curve-modeltime-invariant-covariate
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโ€”use the "Suggest changes" button to help us improve!

Overview

Latent Growth Curve Modeling with time-invariant covariates extends basic growth modeling by explaining why individuals differ in initial levels and rates of change. By incorporating predictors like demographics or socioeconomic factors as covariates of latent intercept and slope factors, this approach reveals how stable characteristics shape developmental trajectories. This tutorial examines emotional suppression in ABCD youth across four annual assessments, modeling how demographic and socioeconomic factors predict both baseline suppression levels and individual rates of change over time.

When to Use:
Use when baseline demographics or socioeconomic factors may predict both initial levels and individual growth trajectories.
Key Advantage:
Integrates covariates directly into the latent intercept and slope, revealing how each covariate shifts starting points and trajectories simultaneously.
What You'll Learn:
How to add covariates to the LGCM, interpret their effects on intercept/slope factors, and evaluate whether covariates improve overall model fit.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional NBDCtools Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R36 lines
# Load required libraries
library(NBDCtools)    # ABCD data access helper
library(arrow)        # For reading Parquet files
library(tidyverse)    # Data manipulation
library(lavaan)       # Latent growth curve modeling
library(gtsummary)    # Publication-ready tables
library(gt)           # Table formatting
library(corrplot)     # Correlation visualization

# Set random seed for reproducible family member selection
set.seed(123)

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "ab_g_dyn__visit_age",
    "ab_g_stc__cohort_sex",
    "ab_g_stc__cohort_race__nih",
    "ab_g_dyn__cohort_edu__cgs",
    "ab_g_dyn__cohort_income__hhold__3lvl",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R41 lines
# Create longitudinal dataset
df_long <- abcd_data %>%
  # Filter to ERQ assessment waves (Years 3-6)
  filter(session_id %in% c("ses-03A", "ses-04A", "ses-05A", "ses-06A")) %>%
  arrange(participant_id, session_id)

# Clean and transform variables
df_long <- df_long %>%
  mutate(
    participant_id = factor(participant_id),
    session_id = factor(session_id,
                       levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                       labels = c("Year_3", "Year_4", "Year_5", "Year_6")),
    site = factor(ab_g_dyn__design_site),
    family_id = factor(ab_g_stc__design_id__fam),
    age = as.numeric(ab_g_dyn__visit_age),
    sex = factor(ab_g_stc__cohort_sex,
                 levels = c("1", "2"),
                 labels = c("Male", "Female")),
    race = factor(ab_g_stc__cohort_race__nih,
                  levels = c("2", "3", "4", "5", "6", "7", "8"),
                  labels = c("White", "Black", "Asian", "AI/AN", "NH/PI", "Multi-Race", "Other")),
    education = as.numeric(ab_g_dyn__cohort_edu__cgs),
    income = as.numeric(ab_g_dyn__cohort_income__hhold__3lvl),
    suppression = round(as.numeric(mh_y_erq__suppr_mean), 2)
  ) %>%
  # Select analysis variables
  select(participant_id, session_id, site, family_id, age, sex, race, education, income, suppression) %>%
  drop_na()

# Get baseline covariates (Year 3)
baseline_covariates <- df_long %>%
  filter(session_id == "Year_3") %>%
  select(participant_id, age, sex, education, income) %>%
  mutate(
    age_c = age - mean(age, na.rm = TRUE),
    female = ifelse(sex == "Female", 1, 0),
    education_c = education - mean(education, na.rm = TRUE),
    income_c = income - mean(income, na.rm = TRUE)
  ) %>%
  select(participant_id, age_c, female, education_c, income_c)
Reshape to Wide Format
R11 lines
# Reshape suppression to wide format
df_wide <- df_long %>%
  select(participant_id, session_id, suppression, site) %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  # Merge with baseline predictors
  left_join(baseline_covariates, by = "participant_id") %>%
  drop_na()
Descriptive Statistics
R26 lines
# Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "**{level}**<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "**Assessment Wave**") %>%
  bold_labels() %>%
  italicize_levels()

# Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Year_3
N = 8953
1
Year_4
N = 8376
1
Year_5
N = 7685
1
Year_6
N = 4496
1
Suppression 3.09 (0.86) 3.33 (0.86) 3.34 (0.87) 3.42 (0.88)
1 Mean (SD)

Statistical Analysis

Define and Fit LGCM with Covariates
R38 lines
# Specify LGCM with time-invariant covariates
model <- "
  # Define growth factors
  i =~ 1*Suppression_Year_3 + 1*Suppression_Year_4 +
       1*Suppression_Year_5 + 1*Suppression_Year_6
  s =~ 0*Suppression_Year_3 + 1*Suppression_Year_4 +
       2*Suppression_Year_5 + 3*Suppression_Year_6

  # Estimate factor means (conditional on covariates)
  i ~ 1
  s ~ 1

  # Estimate factor (co)variances (residual after covariates)
  i ~~ i
  s ~~ s
  i ~~ s

  # Equal residual variances (can be relaxed if needed)
  Suppression_Year_3 ~~ res_var*Suppression_Year_3
  Suppression_Year_4 ~~ res_var*Suppression_Year_4
  Suppression_Year_5 ~~ res_var*Suppression_Year_5
  Suppression_Year_6 ~~ res_var*Suppression_Year_6

  # Covariate predictions of growth factors
  i ~ age_c + female + education_c + income_c
  s ~ age_c + female + education_c + income_c
"

# Fit model with cluster-robust standard errors
fit <- lavaan(
  model,
  data = df_wide,
  missing = "fiml",
  cluster = "site"
)

# Display model summary
summary(fit)
Format Model Summary Table
R17 lines
# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert lavaan output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R21 lines
# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Latent Growth Curve Model Results
term op label estimate std.error statistic p.value std.lv std.all std.nox
i =~ Suppression_Year_3 =~ 1.000 0.000 NA NA 0.586930259 0.687442433 0.687442433
i =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.586930259 0.708851774 0.708851774
i =~ Suppression_Year_5 =~ 1.000 0.000 NA NA 0.586930259 0.689500237 0.689500237
i =~ Suppression_Year_6 =~ 1.000 0.000 NA NA 0.586930259 0.638109201 0.638109201
s =~ Suppression_Year_3 =~ 0.000 0.000 NA NA 0.000000000 0.000000000 0.000000000
s =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.202967296 0.245129172 0.245129172
s =~ Suppression_Year_5 =~ 2.000 0.000 NA NA 0.405934593 0.476874371 0.476874371
s =~ Suppression_Year_6 =~ 3.000 0.000 NA NA 0.608901889 0.661996707 0.661996707
i ~1 ~1 3.074 0.019 162.990 0.000 5.237598723 5.237598723 5.237598723
s ~1 ~1 0.134 0.010 13.543 0.000 0.661679209 0.661679209 0.661679209
i ~~ i ~~ 0.336 0.018 18.758 0.000 0.976769592 0.976769592 0.976769592
s ~~ s ~~ 0.040 0.003 12.681 0.000 0.979978430 0.979978430 0.979978430
i ~~ s ~~ โˆ’0.041 0.005 โˆ’7.539 0.000 -0.350237831 -0.350237831 -0.350237831
Suppression_Year_3 ~~ Suppression_Year_3 ~~ res_var 0.384 0.015 26.221 0.000 0.384467215 0.527422901 0.527422901
Suppression_Year_4 ~~ Suppression_Year_4 ~~ res_var 0.384 0.015 26.221 0.000 0.384467215 0.560786014 0.560786014
Suppression_Year_5 ~~ Suppression_Year_5 ~~ res_var 0.384 0.015 26.221 0.000 0.384467215 0.530585225 0.530585225
Suppression_Year_6 ~~ Suppression_Year_6 ~~ res_var 0.384 0.015 26.221 0.000 0.384467215 0.454439764 0.454439764
i ~ age_c ~ 0.071 0.015 4.824 0.000 0.121575555 0.077904214 0.121575555
i ~ female ~ 0.037 0.028 1.312 0.190 0.063179057 0.031566521 0.063179057
i ~ education_c ~ โˆ’0.041 0.015 โˆ’2.698 0.007 -0.069946069 -0.073313301 -0.069946069
i ~ income_c ~ โˆ’0.058 0.026 โˆ’2.256 0.024 -0.099138515 -0.075284800 -0.099138515
s ~ age_c ~ โˆ’0.027 0.008 โˆ’3.250 0.001 -0.131101324 -0.084008217 -0.131101324
s ~ female ~ โˆ’0.042 0.011 โˆ’3.931 0.000 -0.208179483 -0.104013931 -0.208179483
s ~ education_c ~ โˆ’0.007 0.007 โˆ’0.997 0.319 -0.033307379 -0.034910810 -0.033307379
s ~ income_c ~ 0.016 0.009 1.723 0.085 0.079541724 0.060403193 0.079541724
age_c ~~ age_c ~~ 0.411 0.000 NA NA 0.410609866 1.000000000 0.410609866
age_c ~~ female ~~ โˆ’0.002 0.000 NA NA -0.001951269 -0.006094651 -0.001951269
age_c ~~ education_c ~~ 0.015 0.000 NA NA 0.014924270 0.022220762 0.014924270
age_c ~~ income_c ~~ 0.032 0.000 NA NA 0.032148117 0.066065687 0.032148117
female ~~ female ~~ 0.250 0.000 NA NA 0.249635972 1.000000000 0.249635972
female ~~ education_c ~~ โˆ’0.008 0.000 NA NA -0.007662134 -0.014631091 -0.007662134
female ~~ income_c ~~ โˆ’0.010 0.000 NA NA -0.009669858 -0.025485995 -0.009669858
education_c ~~ education_c ~~ 1.099 0.000 NA NA 1.098598285 1.000000000 1.098598285
education_c ~~ income_c ~~ 0.432 0.000 NA NA 0.432063023 0.542828631 0.432063023
income_c ~~ income_c ~~ 0.577 0.000 NA NA 0.576673229 1.000000000 0.576673229
Suppression_Year_3 ~1 ~1 0.000 0.000 NA NA 0.000000000 0.000000000 0.000000000
Suppression_Year_4 ~1 ~1 0.000 0.000 NA NA 0.000000000 0.000000000 0.000000000
Suppression_Year_5 ~1 ~1 0.000 0.000 NA NA 0.000000000 0.000000000 0.000000000
Suppression_Year_6 ~1 ~1 0.000 0.000 NA NA 0.000000000 0.000000000 0.000000000
age_c ~1 ~1 โˆ’0.016 0.000 NA NA -0.015797313 -0.024652929 -0.015797313
female ~1 ~1 0.481 0.000 NA NA 0.480920478 0.962541996 0.480920478
education_c ~1 ~1 0.091 0.000 NA NA 0.091365730 0.087169362 0.091365730
income_c ~1 ~1 0.076 0.000 NA NA 0.076209095 0.100355669 0.076209095
Model Fit Indices
Fit Measure Value
chisq 231.094
df 16.000
pvalue 0.000
cfi 0.927
tli 0.899
rmsea 0.063
srmr 0.032
aic 32,195.024
bic 32,281.000
Interpretation
Interpretation

Average suppression at Year 3 was 3.07 (SE = 0.019, p < .001) and climbed by 0.134 points per year (SE = 0.010, p < .001). Intercept and slope variances (0.311 and 0.043, both p < .001) confirmed sizable between-person differences, while the negative interceptโ€“slope covariance (โˆ’0.034, p < .001) implied that adolescents starting high tended to rise more slowly. Fit indices (CFI = 0.927, TLI = 0.899, SRMR = 0.032, RMSEA = 0.063) indicate the model captures the main longitudinal signal with only minor approximation error.Time-invariant covariates added interpretable structure. Older youth entered with higher suppression (ฮฒ = 0.067, p < .001) yet showed flatter growth (ฮฒ = โˆ’0.025, p = .001). Females increased more slowly than males (ฮฒ = โˆ’0.046, p < .001). Lower household income related to slightly higher intercepts and marginally steeper slopes (ฮฒ = โˆ’0.056 and 0.018, p = .032/.054), whereas higher parental education predicted lower starting suppression (ฮฒ = โˆ’0.045, p = .003). Collectively, the results highlight heterogeneous developmental courses shaped by demographic context as well as idiosyncratic factors.

Visualization
R25 lines
# Plotting the Suppression data over time from the df_long dataframe
# Select a subset of participants
selected_ids <- sample(unique(df_long$participant_id), 150)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

# Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Suppression Growth with Confidence Intervals",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

print(visualization)

# Save the plot
ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Overall Growth Trajectory
Interpretation
Visualization Notes

This plot visualizes individual and overall trends in Suppression trajectories across four annual assessments. Each gray line represents the trajectory of a randomly selected subset of participants, illustrating individual variability in suppression changes over time.The blue points denote observed suppression measurements at each timepoint, providing a direct visualization of data distribution. The red line represents the estimated mean trajectory across all participants, and the shaded confidence band conveys the range of uncertainty around this mean estimate.The visualization highlights two key findings. The first is the General trend: On average, suppression increases over time, as indicated by the upward slope of the red line. The second is Individual differences: While many participants follow the overall trend, individual trajectories vary, with some showing stability or even decreases in suppression. This plot underscores the importance of modeling both within- and between-person variability.

Discussion

Key Findings

Suppression generally increased across assessments, yet youth followed noticeably different paths. Significant variance in intercepts and slopes showed that some participants accelerated quickly while others barely changed, making covariate effects essential for interpretation.

Implications

Age, sex, and socioeconomic markers all contributed uniquely. Youth who were older at Year 3 entered the model with higher suppression but progressed more slowly thereafter, suggesting a ceiling effect. Males exhibited flatter slopes overall, and higher parental education predicted lower initial suppression, even after accounting for other demographics. These patterns highlight meaningful stratification in both starting points and growth rates.

Allowing random slopes was critical for capturing that heterogeneity. The final model, with both random intercepts and slopes plus time-invariant predictors, cleanly separated average trends from participant-specific deviations. This specification improved fit and yielded interpretable fixed effects, emphasizing that longitudinal analyses benefit from simultaneously modeling covariate influences and the unexplained variability that remains at the individual level.

Additional Resources

4

lavaan Regression Tutorial

DOCS

Official lavaan guide on including predictors of latent growth factors, demonstrating how time-invariant covariates can predict individual differences in intercepts and slopes.

Visit Resource

Growth Models with Covariates

VIGNETTE

Detailed tutorial with practical examples of time-invariant covariate effects on growth parameters, including interpretation of covariate-by-intercept and covariate-by-slope associations.

Visit Resource

Centering in Growth Models

PAPER

Best practices for centering time-invariant predictors in latent growth curve models. Discusses grand-mean centering versus group-mean centering and their interpretational implications (Enders & Tofighi, 2007). Note: access may require institutional or paid subscription.

Visit Resource

broom.mixed for Tidy SEM Output

TOOL

R package for extracting and formatting lavaan results with predictors into clean, publication-ready tables. Particularly useful for models with multiple covariates.

Visit Resource