Longitudinal.dev

Latent Growth Curve Models (LGCM)

Nesting (LGCM)

Incorporate nesting within families and sites in latent growth curve models to obtain robust emotional suppression trajectories for clustered ABCD youth.

LGCM โ€บ Continuous
lavaanUpdated 2025-11-04abcd-studytrajectorynesting
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโ€”use the "Suggest changes" button to help us improve!

Overview

Latent Growth Curve Modeling with clustering addresses dependencies where observations within the same family or study site are more similar than observations from different clusters. Ignoring this structure can lead to biased standard errors, incorrect p-values, and inappropriate confidence intervals. This tutorial demonstrates LGCM with nested clustering to account for site-level and family-level dependencies using robust standard errors in ABCD data, examining emotional suppression trajectories across four annual assessments.

When to Use:
Apply when participants are nested in families or sites and you need to adjust standard errors for clustering while modeling growth.
Key Advantage:
Robust standard errors correct for dependency without requiring full multilevel specification, maintaining simpler model interpretation.
What You'll Learn:
How to apply robust clustering in lavaan, interpret corrected standard errors, and assess nesting impact on inference.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R31 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(arrow)        # For reading Parquet files
library(tidyverse)    # Collection of R packages for data science
library(gtsummary)    # Creating publication-quality tables
library(lavaan)       # Structural Equation Modeling in R
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

# Set random seed for reproducible family member selection
set.seed(123)

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R35 lines
# Create longitudinal dataset with cleaned variables
df_long <- abcd_data %>%
  # Filter to ERQ assessment waves (Years 3-6)
  filter(session_id %in% c("ses-03A", "ses-04A", "ses-05A", "ses-06A")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    # Relabel sessions
    session_id = factor(session_id,
                        levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                        labels = c("Year_3", "Year_4", "Year_5", "Year_6")),
    # Clean outcome variable
    suppression = round(as.numeric(mh_y_erq__suppr_mean), 2)
  ) %>%
  # Rename clustering variables
  rename(
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam
  ) %>%

  # Keep only analysis variables
  select(participant_id, session_id, site, family_id, suppression) %>%

  # Remove cases with missing outcome data
  drop_na(suppression) %>%
  droplevels()

df_wide <- df_long %>%
  select(participant_id, site, family_id, session_id, suppression) %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  # Remove cases with incomplete data across waves
  drop_na(starts_with("Suppression_"))
Descriptive Statistics
R25 lines
# Create comprehensive descriptive table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(suppression ~ "Emotional Suppression"),
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    digits = all_continuous() ~ 2
  ) %>%
  modify_header(all_stat_cols() ~ "**{level}**<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "**Assessment Wave**") %>%
  add_overall(last = TRUE) %>%
  bold_labels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Overall
N = 33,689
1
Year_3
N = 10318
1
Year_4
N = 9586
1
Year_5
N = 8784
1
Year_6
N = 5001
1
Emotional Suppression 3.10 (0.86) 3.35 (0.86) 3.35 (0.87) 3.42 (0.88) 3.28 (0.88)
1 Mean (SD)

Statistical Analysis

Define and Fit Nested LGCM
R31 lines
# Define LGCM specification with explicit residual variance labels
model <- "
  # Define growth factors with time-specific loadings
  i =~ 1*Suppression_Year_3 + 1*Suppression_Year_4 + 1*Suppression_Year_5 + 1*Suppression_Year_6
  s =~ 0*Suppression_Year_3 + 1*Suppression_Year_4 + 2*Suppression_Year_5 + 3*Suppression_Year_6

  # Allow growth factors to have variance (individual differences)
  i ~~ i
  s ~~ s

  # Allow correlation between intercept and slope
  i ~~ s

  # Specify residual variances for each time point
  Suppression_Year_3 ~~ var_y3*Suppression_Year_3
  Suppression_Year_4 ~~ var_y4*Suppression_Year_4
  Suppression_Year_5 ~~ var_y5*Suppression_Year_5
  Suppression_Year_6 ~~ var_y6*Suppression_Year_6
"

# Fit the model with nested clustering
fit <- growth(
  model,
  data = df_wide,
  cluster = c("site", "family_id"),
  missing = "fiml",
  se = "robust"
)

# Display model summary with fit indices
summary(fit, fit.measures = TRUE, standardized = TRUE)
Format Model Summary Table
R17 lines
# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert lavaan output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R21 lines
# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Latent Growth Curve Model Results
term op label estimate std.error statistic p.value std.lv std.all
i =~ Suppression_Year_3 =~ 1.000 0.000 NA NA 0.5675782 0.6503569
i =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.5675782 0.6715888
i =~ Suppression_Year_5 =~ 1.000 0.000 NA NA 0.5675782 0.6663466
i =~ Suppression_Year_6 =~ 1.000 0.000 NA NA 0.5675782 0.6371306
s =~ Suppression_Year_3 =~ 0.000 0.000 NA NA 0.0000000 0.0000000
s =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.2141099 0.2533463
s =~ Suppression_Year_5 =~ 2.000 0.000 NA NA 0.4282199 0.5027376
s =~ Suppression_Year_6 =~ 3.000 0.000 NA NA 0.6423298 0.7210424
i ~~ i ~~ 0.322 0.016 19.864 0.000 1.0000000 1.0000000
s ~~ s ~~ 0.046 0.003 16.609 0.000 1.0000000 1.0000000
i ~~ s ~~ โˆ’0.039 0.004 โˆ’9.093 0.000 -0.3199569 -0.3199569
Suppression_Year_3 ~~ Suppression_Year_3 ~~ var_y3 0.439 0.027 16.584 0.000 0.4394917 0.5770359
Suppression_Year_4 ~~ Suppression_Year_4 ~~ var_y4 0.424 0.020 21.739 0.000 0.4240174 0.5936619
Suppression_Year_5 ~~ Suppression_Year_5 ~~ var_y5 0.376 0.017 22.351 0.000 0.3755354 0.5176067
Suppression_Year_6 ~~ Suppression_Year_6 ~~ var_y6 0.292 0.017 16.984 0.000 0.2921493 0.3681377
Suppression_Year_3 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_4 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_5 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_6 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
i ~1 ~1 3.109 0.021 150.180 0.000 5.4778866 5.4778866
s ~1 ~1 0.110 0.007 16.030 0.000 0.5128067 0.5128067
Model Fit Indices
Fit Measure Value
chisq 180.868
df 5.000
pvalue 0.000
cfi 0.949
tli 0.938
rmsea 0.092
srmr 0.045
aic 39,503.178
bic 39,560.255
Interpretation
Interpretation

The LGCM estimated a mean suppression score of 3.11 at Year 3 (SE = 0.021, p < .001) and a gradual increase of 0.11 points per year (SE = 0.007, p < .001). Intercept and slope variances (0.322 and 0.046, both p < .001) were sizable, confirming that youth differed widely in both starting levels and subsequent change. The negative interceptโ€“slope covariance (โˆ’0.039, p < .001) indicates a leveling effect: adolescents who began with higher suppression tended to grow more slowly, whereas those starting low often caught up.Model fit was generally solid (CFI = 0.949, TLI = 0.938, SRMR = 0.045), though the RMSEA of 0.092 hints at minor misfit that could reflect omitted time-specific covariances. Cluster-robust standard errors were computed for both sites and families; adding the family level barely shifted estimates, suggesting that most dependency operates at the site level, but the dual adjustment still guards against underestimated SEs. Finally, residual variances shrank from 0.439 at Year 3 to 0.292 at Year 6, implying that suppression measurements became more stable as participants aged.

Visualization
R22 lines
# Plotting the Suppression data over time from the df_long dataframe
# Select a subset of participants
selected_ids <- sample(unique(df_long$participant_id), 150)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

# Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Suppression Growth with Confidence Intervals",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

  ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Suppression Trajectory Plot
Interpretation
Interpretation

This plot visualizes individual trajectories and the overall trend in suppression scores across four annual assessments. Each line represents the trajectory of a randomly selected subset of participants, highlighting individual differences in suppression development over time.Blue points represent observed suppression scores at each time point, providing a clear depiction of the data distribution.Gray lines connect individual trajectories, illustrating within-person variability in suppression changes.The red smoothed curve represents the overall trend, estimated using a linear regression model, capturing the general pattern of suppression growth with a confidence interval (light pink shading).While many participants exhibit a general increase in suppression, others show stable or declining trajectories, emphasizing heterogeneity in individual change patterns.Between-person differences in initial suppression levels and rates of change reinforce the need for latent growth models to capture both within- and between-person variability in suppression development.

Discussion

Key Findings

Suppression tended to rise across the study, yet individual trajectories varied widely. Some youth showed steep gains, others remained flat, and a few declined. That heterogeneity was captured by significant variance in both intercepts ($0.322$, $SE = 0.016$, $p < .001$) and slopes ($0.046$, $SE = 0.003$, $p < .001$), underscoring why a latent growth framework is preferable to a single average trend.

Implications

Cluster-robust standard errors were computed for sites and families. Adding the family level changed estimates only trivially relative to site-only clustering, suggesting that most dependency was already absorbed at the site level, but keeping both levels still provides conservative uncertainty estimates for downstream reporting.

The negative covariance between intercepts and slopes ($-0.039$, $SE = 0.004$, $p < .001$) points to a leveling-off pattern: participants starting with high suppression gained more slowly, whereas those entering low caught up quickly. Residual variances also shrank from Year 3 ($0.439$) to Year 6 ($0.292$), indicating that scores stabilized with age. Together these results describe a cohort that becomes more homogeneous over time even while retaining meaningful between-person differences in both starting points and change rates.

Additional Resources

4

lavaan Complex Survey Features

DOCS

Official lavaan documentation on cluster-robust standard errors and handling nested data structures in growth curve models, including the cluster argument for adjusting standard errors.

Visit Resource

Multilevel Growth Models in lavaan

VIGNETTE

Step-by-step lavaan tutorial that demonstrates how to specify two-level growth models using the argument, adjust standard errors for nested designs, and interpret between-cluster versus within-cluster effects.

Visit Resource

Nested Data Structures in Growth Modeling

PAPER

Methodology paper discussing how to properly account for clustering in longitudinal structural equation models, including strategies for modeling nested dependencies.

Visit Resource

lavaan.survey Package

TOOL

CRAN package that extends lavaan to complex survey and clustered data via sandwich estimators, letting you fit the same growth models while properly accounting for design weights and multi-level clustering.

Visit Resource