Latent Growth Curve Models (LGCM)

Multivariate (LGCM)

Fit multivariate latent growth curve models to estimate parallel developmental processes and relate their intercepts and slopes using lavaan.

LGCM › Continuous
lavaanUpdated 2025-11-04abcd-studytrajectoryparallel-process
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedback—use the "Suggest changes" button to help us improve!

Overview

Multivariate Latent Growth Curve Modeling (MLGCM) simultaneously models trajectories of multiple outcomes, revealing how different developmental processes unfold together over time. By estimating intercept and slope factors for each construct in a unified framework, MLGCM captures individual growth patterns while quantifying dynamic interrelationships between developmental trajectories. This tutorial examines co-development of externalizing and internalizing symptoms in ABCD youth across four annual assessments, estimating parallel growth parameters and cross-domain correlations to reveal comorbidity patterns and shared developmental processes.

When to Use:
Applicable when you want to model parallel growth processes (e.g., anxiety and suppression) simultaneously to see how trajectories are linked.
Key Advantage:
Captures covariance between multiple latent intercepts/slopes, revealing whether growth in one domain correlates with growth in another.
What You'll Learn:
How to specify multivariate LGCMs, interpret covariance among latent factors, and visualize coupled trajectories.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R29 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(gtsummary)    # Creating publication-quality tables
library(lavaan)       # Structural Equation Modeling in R
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables
library(patchwork)    # For combining ggplots

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
  "ab_g_dyn__design_site",
  "ab_g_stc__design_id__fam",
  "mh_p_cbcl__synd__ext_tscore",
  "mh_p_cbcl__synd__int_tscore"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R31 lines
### Clean and transform variables for analysis
df_long <- abcd_data %>%
  # Filter to baseline through Year 3
  # Note: Both "ses-00" and "ses-00A" are included as baseline to capture all available baseline data
  filter(session_id %in% c("ses-00", "ses-00A", "ses-01A", "ses-02A", "ses-03A")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(
      session_id,
      levels = c("ses-00", "ses-00A", "ses-01A", "ses-02A", "ses-03A"),
      labels = c("Baseline", "Baseline", "Year_1", "Year_2", "Year_3")
    )
  ) %>%
  rename(
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    externalizing = mh_p_cbcl__synd__ext_tscore,
    internalizing = mh_p_cbcl__synd__int_tscore
  ) %>%
  select(participant_id, session_id, site, family_id, externalizing, internalizing) %>%
  droplevels() %>%
  drop_na(externalizing, internalizing)                 # Remove rows with missing outcomes

# Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = c(externalizing, internalizing),
    names_sep = "_"
  ) %>%
  drop_na(starts_with("externalizing_"), starts_with("internalizing_"))  # Require complete data
Descriptive Statistics
R28 lines
### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, externalizing, internalizing) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      externalizing ~ "Externalizing",
      internalizing ~ "Internalizing"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
  gt::gtsave(descriptives_table,
             filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Baseline
N = 11861
1
Year_1
N = 11201
1
Year_2
N = 10897
1
Year_3
N = 10196
1
Externalizing 46 (10) 45 (10) 44 (10) 44 (10)
Internalizing 48 (11) 49 (11) 48 (11) 48 (11)
1 Mean (SD)

Statistical Analysis

Define and Fit Multivariate LGCM
R34 lines
# Define multivariate growth model
model <- "
  # Growth model for externalizing
  i_var1 =~ 1*externalizing_Baseline + 1*externalizing_Year_1 + 1*externalizing_Year_2 + 1*externalizing_Year_3
  s_var1 =~ 0*externalizing_Baseline + 1*externalizing_Year_1 + 2*externalizing_Year_2 + 3*externalizing_Year_3

  # Growth model for internalizing
  i_var2 =~ 1*internalizing_Baseline + 1*internalizing_Year_1 + 1*internalizing_Year_2 + 1*internalizing_Year_3
  s_var2 =~ 0*internalizing_Baseline + 1*internalizing_Year_1 + 2*internalizing_Year_2 + 3*internalizing_Year_3

  # Estimate latent means
  i_var1 ~ 1
  s_var1 ~ 1
  i_var2 ~ 1
  s_var2 ~ 1

  # Fix observed variable intercepts to 0 for identification
  # This ensures the latent growth factors carry all the mean structure
  externalizing_Baseline ~ 0*1
  externalizing_Year_1   ~ 0*1
  externalizing_Year_2   ~ 0*1
  externalizing_Year_3   ~ 0*1

  internalizing_Baseline ~ 0*1
  internalizing_Year_1   ~ 0*1
  internalizing_Year_2   ~ 0*1
  internalizing_Year_3   ~ 0*1
"

# Fit the model using ML for handling missing data
fit <- sem(model, data = df_wide, missing = "ml")

# Display summary with fit measures
summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
Format Model Summary Table
R17 lines
# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Multivariate LGCM Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R21 lines
# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Multivariate LGCM Results
term op estimate std.error statistic p.value std.lv std.all
i_var1 =~ externalizing_Baseline =~ 1.000 0.000 NA NA 8.9650241 0.8794292
i_var1 =~ externalizing_Year_1 =~ 1.000 0.000 NA NA 8.9650241 0.9054640
i_var1 =~ externalizing_Year_2 =~ 1.000 0.000 NA NA 8.9650241 0.9276253
i_var1 =~ externalizing_Year_3 =~ 1.000 0.000 NA NA 8.9650241 0.9362185
s_var1 =~ externalizing_Baseline =~ 0.000 0.000 NA NA 0.0000000 0.0000000
s_var1 =~ externalizing_Year_1 =~ 1.000 0.000 NA NA 1.7633886 0.1781016
s_var1 =~ externalizing_Year_2 =~ 2.000 0.000 NA NA 3.5267771 0.3649212
s_var1 =~ externalizing_Year_3 =~ 3.000 0.000 NA NA 5.2901657 0.5524526
i_var2 =~ internalizing_Baseline =~ 1.000 0.000 NA NA 8.9399688 0.8485679
i_var2 =~ internalizing_Year_1 =~ 1.000 0.000 NA NA 8.9399688 0.8546363
i_var2 =~ internalizing_Year_2 =~ 1.000 0.000 NA NA 8.9399688 0.8623244
i_var2 =~ internalizing_Year_3 =~ 1.000 0.000 NA NA 8.9399688 0.8430902
s_var2 =~ internalizing_Baseline =~ 0.000 0.000 NA NA 0.0000000 0.0000000
s_var2 =~ internalizing_Year_1 =~ 1.000 0.000 NA NA 1.9703869 0.1883635
s_var2 =~ internalizing_Year_2 =~ 2.000 0.000 NA NA 3.9407738 0.3801160
s_var2 =~ internalizing_Year_3 =~ 3.000 0.000 NA NA 5.9111608 0.5574563
i_var1 ~1 ~1 45.375 0.100 452.269 0.000 5.0613584 5.0613584
s_var1 ~1 ~1 −0.416 0.028 −14.768 0.000 -0.2361454 -0.2361454
i_var2 ~1 ~1 48.432 0.103 470.161 0.000 5.4174154 5.4174154
s_var2 ~1 ~1 −0.274 0.032 −8.452 0.000 -0.1392032 -0.1392032
externalizing_Baseline ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
externalizing_Year_1 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
externalizing_Year_2 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
externalizing_Year_3 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
internalizing_Baseline ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
internalizing_Year_1 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
internalizing_Year_2 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
internalizing_Year_3 ~1 ~1 0.000 0.000 NA NA 0.0000000 0.0000000
externalizing_Baseline ~~ externalizing_Baseline ~~ 23.549 0.672 35.034 0.000 23.5488239 0.2266043
externalizing_Year_1 ~~ externalizing_Year_1 ~~ 27.236 0.505 53.924 0.000 27.2363190 0.2778356
externalizing_Year_2 ~~ externalizing_Year_2 ~~ 25.967 0.481 54.020 0.000 25.9668704 0.2780109
externalizing_Year_3 ~~ externalizing_Year_3 ~~ 21.400 0.619 34.549 0.000 21.3996121 0.2333766
internalizing_Baseline ~~ internalizing_Baseline ~~ 31.071 0.860 36.123 0.000 31.0707682 0.2799324
internalizing_Year_1 ~~ internalizing_Year_1 ~~ 36.746 0.668 54.977 0.000 36.7456494 0.3358123
internalizing_Year_2 ~~ internalizing_Year_2 ~~ 34.284 0.637 53.853 0.000 34.2838518 0.3189767
internalizing_Year_3 ~~ internalizing_Year_3 ~~ 30.960 0.849 36.451 0.000 30.9596933 0.2753422
i_var1 ~~ i_var1 ~~ 80.372 1.436 55.963 0.000 1.0000000 1.0000000
s_var1 ~~ s_var1 ~~ 3.110 0.146 21.267 0.000 1.0000000 1.0000000
i_var2 ~~ i_var2 ~~ 79.923 1.537 51.992 0.000 1.0000000 1.0000000
s_var2 ~~ s_var2 ~~ 3.882 0.193 20.114 0.000 1.0000000 1.0000000
i_var1 ~~ s_var1 ~~ −6.344 0.346 −18.317 0.000 -0.4012688 -0.4012688
i_var1 ~~ i_var2 ~~ 58.643 1.178 49.789 0.000 0.7316977 0.7316977
i_var1 ~~ s_var2 ~~ −6.484 0.327 −19.854 0.000 -0.3670378 -0.3670378
s_var1 ~~ i_var2 ~~ −7.318 0.295 −24.829 0.000 -0.4642100 -0.4642100
s_var1 ~~ s_var2 ~~ 4.321 0.100 43.222 0.000 1.2436066 1.2436066
i_var2 ~~ s_var2 ~~ −5.564 0.417 −13.344 0.000 -0.3158614 -0.3158614
Model Fit Indices
Fit Measure Value
chisq 4,421.337
df 22.000
pvalue 0.000
cfi 0.921
tli 0.899
rmsea 0.144
srmr 0.030
aic 527,274.153
bic 527,432.047
Interpretation
Interpretation

Externalizing symptoms averaged 45.4 at Baseline and declined by 0.416 points per year (both p < .001); internalizing symptoms started at 48.4 and fell by 0.274 points annually. Thus, both domains improved over the study window. Intercept variances of roughly 80 and slope variances of 3–4 confirmed large between-person differences, so the population mean hides substantial heterogeneity.Model fit was mixed—SRMR = 0.030 was excellent, whereas CFI/TLI (0.921/0.899) and RMSEA (0.144) suggested that additional cross-domain links might further reduce misfit. Even so, the estimated covariances captured the core comorbidity pattern: youth who started high on externalizing also tended to start high on internalizing (covariance = 58.6, p < .001), and declines in one domain generally coincided with declines in the other (slope covariance = 4.32, p < .001). Negative intercept–slope covariances within each domain imply diminishing returns—youth with the highest baseline symptoms improved, but at a slower rate than peers with milder presentations.

Visualization
R38 lines
### Calculate means for each time point for externalizing and internalizing
mean_externalizing <- df_wide %>%
    summarise(across(starts_with("externalizing"), mean, na.rm = TRUE))

mean_internalizing <- df_wide %>%
    summarise(across(starts_with("internalizing"), mean, na.rm = TRUE))

### Reshape the data for plotting
mean_externalizing_long <- pivot_longer(mean_externalizing, cols = everything(), names_to = "Time", values_to = "Mean_externalizing")
mean_internalizing_long <- pivot_longer(mean_internalizing, cols = everything(), names_to = "Time", values_to = "Mean_internalizing")

### Plot the mean trajectories for externalizing
externalizing_plot <- ggplot(mean_externalizing_long, aes(x = Time, y = Mean_externalizing, group = 1)) +
    geom_line(color = "blue", linewidth = 1.2) +
    geom_point(color = "blue") +
    labs(title = "Mean Growth Trajectory for externalizing", y = "Mean externalizing", x = "Time Point") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

### Plot the mean trajectories for internalizing
internalizing_plot <- ggplot(mean_internalizing_long, aes(x = Time, y = Mean_internalizing, group = 1)) +
    geom_line(color = "red", linewidth = 1.2) +
    geom_point(color = "red") +
    labs(title = "Mean Growth Trajectory for internalizing", y = "Mean internalizing", x = "Time Point") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

### Combine the plots side by side using patchwork
combined_plot <- externalizing_plot + internalizing_plot
print(combined_plot)

# Save as a high-resolution PNG file
ggsave(filename = "visualization.png",
       plot = combined_plot,
       width = 10,  # Specify width in inches (or units)
       height = 5, # Specify height in inches (or units)
       units = "in", # Specify units (e.g., "in", "cm", "mm")
       dpi = 300) # Specify resolution (e.g., 300 for good quality)
Visualization
Interpretation
Visualization Notes

This visualization displays the mean trajectories for externalizing (blue) and internalizing (red) behaviors across four annual assessments. Both plots show the average change patterns at the group level, highlighting the overall developmental trends.The externalizing trajectory (left panel) shows a modest decline from baseline through Year 3, with symptoms decreasing approximately 0.4 points per year. The internalizing trajectory (right panel) exhibits a similar but slightly smaller downward trend, declining approximately 0.3 points annually. Both patterns demonstrate symptom improvement over time at the population level.The parallel visualization format facilitates comparison of trajectory shapes and magnitude differences between the two behavioral domains. While both show declining trends, externalizing symptoms begin at lower baseline levels (around 45) compared to internalizing symptoms (around 48), and both converge toward similar levels by Year 3. This coordinated decline across domains aligns with the positive slope covariance identified in the model, suggesting shared developmental processes or common influences driving improvement in both behavioral domains.

Discussion

Key Findings

The multivariate growth model captured parallel declines in both externalizing (slope = −0.416, SE = 0.028, p < .001) and internalizing behaviors (slope = −0.274, SE = 0.032, p < .001). Intercept and slope variances were significant for each domain, confirming that youth entered adolescence with very different symptom levels and moved along heterogeneous pathways even when the overall trend pointed downward.

Implications

Model diagnostics were mixed. The SRMR of 0.030 suggested strong absolute fit, whereas the CFI (0.921) and TLI (0.899) fell shy of the 0.95 benchmark and the RMSEA (0.144) exceeded common cutoffs, implying that additional cross-loadings or time-specific covariances might still be needed to fully capture the observed structure.

Cross-domain associations were pronounced. Positive intercept (58.643, p < .001) and slope (4.321, p < .001) covariances indicated that youth with elevated baseline externalizing symptoms also tended to start high on internalizing problems and that improvements occurred in tandem. Negative intercept–slope covariances within each domain suggested diminishing returns: adolescents starting with the greatest difficulties exhibited slower recovery. Together, these findings highlight meaningful comorbidity and underscore the value of parallel-process models for charting cascading behavioral change.

Additional Resources

4

lavaan Multivariate Growth Models

DOCS

Official lavaan documentation on parallel process models with multiple outcomes, demonstrating how to specify correlated intercepts and slopes across different constructs.

Visit Resource

Parallel Process LGCM Tutorial

VIGNETTE

Step-by-step quantitative methods tutorial on modeling correlated growth trajectories between two or more outcomes, including interpretation of cross-domain associations.

Visit Resource

Multivariate Growth Models in SEM

PAPER

Foundational methodology paper on coupled change processes in multivariate longitudinal data, covering theoretical frameworks and practical applications (Bollen & Curran, 2004). Note: access may require institutional or paid subscription.

Visit Resource

lavaan Model Syntax Examples

TOOL

Collection of lavaan syntax examples for complex structural equation models including multivariate growth curves, with code snippets and interpretation guidance.

Visit Resource