Latent Growth Curve Models (LGCM)

Basic (LGCM)

Introduce latent growth curve modeling to estimate average emotional suppression trajectories, growth rates, and individual variability across repeated ABCD assessments.

LGCM › Continuous
lavaanUpdated 2025-11-04abcd-studytrajectorygrowth
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedback—use the "Suggest changes" button to help us improve!

Overview

Latent Growth Curve Modeling (LGCM) analyzes longitudinal change by estimating growth trajectories as latent factors while distinguishing systematic development from measurement error. Using intercept and slope parameters, LGCM captures both population-average patterns and individual differences in developmental processes, providing more accurate estimates than traditional repeated measures approaches. This tutorial applies LGCM to examine emotional suppression in ABCD youth across four annual assessments, estimating the average trajectory and individual variation in initial levels and rates of change.

When to Use:
Ideal when you have repeated ABCD measures and want to model the average growth trajectory plus individual deviations.
Key Advantage:
LGCM provides latent intercept and slope factors, so you can quantify both initial status and change over time with measurement error accounted for.
What You'll Learn:
How to specify a basic LGCM in lavaan, interpret intercept/slope estimates, and assess overall model fit and residual structure.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R28 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(arrow)        # For reading Parquet files
library(gtsummary)    # Creating publication-quality tables
library(lavaan)       # Structural Equation Modeling in R
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R27 lines
### Create a long-form dataset with relevant columns
df_long <- abcd_data %>%
  select(participant_id, session_id, ab_g_dyn__design_site, ab_g_stc__design_id__fam, mh_y_erq__suppr_mean) %>%
  # Filter to Years 3-6 annual assessments using NBDCtools
  filter_events_abcd(conditions = c("annual", ">=3", "<=6")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(session_id,
                        levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                        labels = c("Year_3", "Year_4", "Year_5", "Year_6"))  # Relabel sessions for clarity
  ) %>%
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    suppression = mh_y_erq__suppr_mean
  ) %>%
  droplevels() %>%                                     # Drop unused factor levels
  drop_na(suppression)                                 # Remove rows with missing outcome data

### Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  drop_na(starts_with("Suppression_"))  # Require complete data across all time points
Descriptive Statistics
R26 lines
### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Year_3
N = 10318
1
Year_4
N = 9586
1
Year_5
N = 8784
1
Year_6
N = 5001
1
Suppression 3.10 (0.86) 3.35 (0.86) 3.35 (0.87) 3.42 (0.88)
1 Mean (SD)

Statistical Analysis

Define and Fit Basic LGCM
R21 lines
# Define model specification
model <- "
  i =~ 1*Suppression_Year_3 + 1*Suppression_Year_4 + 1*Suppression_Year_5 + 1*Suppression_Year_6
  s =~ 0*Suppression_Year_3 + 1*Suppression_Year_4 + 2*Suppression_Year_5 + 3*Suppression_Year_6

  # Intercept and slope variances
  i ~~ i
  s ~~ s

  # Residual variances for each observed variable
  Suppression_Year_3 ~~ var_baseline*Suppression_Year_3
  Suppression_Year_4 ~~ var_year1*Suppression_Year_4
  Suppression_Year_5 ~~ var_year2*Suppression_Year_5
  Suppression_Year_6 ~~ var_year3*Suppression_Year_6
"

# Fit the growth model
fit <- growth(model, data = df_wide, missing = "ml")

# Display model summary
summary(fit)
Format Model Summary Table
R17 lines
# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert lavaan output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R21 lines
# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Latent Growth Curve Model Results
term op label estimate std.error statistic p.value std.lv std.all
i =~ Suppression_Year_3 =~ 1.000 0.000 NA NA 0.5674924 0.6503570
i =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.5674924 0.6715766
i =~ Suppression_Year_5 =~ 1.000 0.000 NA NA 0.5674924 0.6663175
i =~ Suppression_Year_6 =~ 1.000 0.000 NA NA 0.5674924 0.6371410
s =~ Suppression_Year_3 =~ 0.000 0.000 NA NA 0.0000000 0.0000000
s =~ Suppression_Year_4 =~ 1.000 0.000 NA NA 0.2140114 0.2532634
s =~ Suppression_Year_5 =~ 2.000 0.000 NA NA 0.4280228 0.5025601
s =~ Suppression_Year_6 =~ 3.000 0.000 NA NA 0.6420341 0.7208313
i ~~ i ~~ 0.322 0.015 21.066 0.000 1.0000000 1.0000000
s ~~ s ~~ 0.046 0.003 13.426 0.000 1.0000000 1.0000000
Suppression_Year_3 ~~ Suppression_Year_3 ~~ var_baseline 0.439 0.016 26.880 0.000 0.4393588 0.5770358
Suppression_Year_4 ~~ Suppression_Year_4 ~~ var_year1 0.424 0.012 36.743 0.000 0.4239019 0.5936580
Suppression_Year_5 ~~ Suppression_Year_5 ~~ var_year2 0.376 0.011 35.180 0.000 0.3755151 0.5176899
Suppression_Year_6 ~~ Suppression_Year_6 ~~ var_year3 0.292 0.015 20.147 0.000 0.2921647 0.3682805
i ~~ s ~~ −0.039 0.006 −6.492 0.000 -0.3198840 -0.3198840
Suppression_Year_3 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_4 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_5 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
Suppression_Year_6 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
i ~1 ~1 3.109 0.012 253.293 0.000 5.4787306 5.4787306
s ~1 ~1 0.110 0.005 20.502 0.000 0.5130239 0.5130239
Model Fit Indices
Fit Measure Value
chisq 180.816
df 5.000
pvalue 0.000
cfi 0.949
tli 0.938
rmsea 0.092
srmr 0.045
aic 39,498.889
bic 39,555.966
Interpretation
Interpretation

The LGCM fit was generally strong (CFI = 0.949, TLI = 0.938, SRMR = 0.045), with only the RMSEA (0.092) hinting at modest residual misfit. Average suppression at Year 3 was 3.109 (SE = 0.012, p < .001) and rose by 0.110 points per year (SE = 0.005, p < .001), indicating a small but reliable increase. Intercept and slope variances (0.322 and 0.046, both p < .001) confirmed that adolescents differed markedly in both starting levels and rates of change. The negative intercept–slope covariance (−0.039, p < .001) implies that youth who began with high suppression tended to grow more slowly, whereas those starting lower closed the gap. Residual variances declined from 0.439 at Year 3 to 0.292 by Year 6, suggesting that measurements became more stable across successive assessments. Overall, the model depicts a cohort-wide rise in suppression layered on top of substantial between-person heterogeneity.

Visualization
R22 lines
### Select a subset of participants
n_sample <- min(150, length(unique(df_long$participant_id)))
selected_ids <- sample(unique(df_long$participant_id), n_sample)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

### Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Emotional Suppression Trajectories Over Time",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

      ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Emotional Suppression Trajectory Plot
Interpretation
Visualization Notes

Each gray line shows a participant’s suppression trajectory across the four assessments, while blue points mark the observed scores and the red line traces the sample-wide mean. The upward tilt of the red line signals a cohort-level increase in suppression, yet the fan of gray lines makes it clear that individuals follow very different paths—some rise steeply, others flatten or dip. Because the observed points hug their respective lines, the plot also reassures us that the smoothing reflects the actual data rather than an artifact of model assumptions. In short, the figure simultaneously communicates the population trend and the heterogeneity that motivates a latent growth curve approach.

Discussion

The analysis reveals heterogeneous suppression trajectories, with the overall trend indicating increasing suppression over time while individual trajectories varied substantially. Some participants exhibited slower or faster growth patterns, demonstrating the value of modeling random slope variability. The model captured significant individual differences in suppression trajectories by allowing for random slopes, improving fit compared to a model with only fixed effects.

The inclusion of both random intercepts and slopes provided a more flexible framework for understanding variability in initial suppression levels and growth rates across participants. The latent growth curve model (LGCM) enables a more detailed examination of longitudinal trends by modeling both baseline differences (intercepts) and individual variability in rates of change (slopes), offering deeper insights into developmental patterns over time.

Additional Resources

4

lavaan Growth Curve Tutorial

DOCS

Official lavaan documentation for latent growth curve modeling basics, covering model specification, parameter estimation, and result interpretation for unconditional growth models.

Visit Resource

Structural Equation Modeling in lavaan

VIGNETTE

Comprehensive vignette covering growth models within the structural equation modeling framework, including detailed examples of latent growth curve specifications and output interpretation.

Visit Resource

Longitudinal Data Analysis by Singer & Willett

BOOK

Foundational textbook on growth curve modeling. Chapters 3-4 provide thorough coverage of unconditional growth models, including interpretation of intercepts, slopes, and random effects for longitudinal data. Note: access may require institutional or paid subscription.

Visit Resource

semPlot Package for Model Visualization

TOOL

R package for creating publication-quality path diagrams of structural equation models and latent growth curve models, with extensive customization options for visualization.

Visit Resource