Overview

Multivariate Latent Growth Curve Modeling (MLGCM) simultaneously models trajectories of multiple outcomes, revealing how different developmental processes unfold together over time. By estimating intercept and slope factors for each construct in a unified framework, MLGCM captures individual growth patterns while quantifying dynamic interrelationships between developmental trajectories. This tutorial examines co-development of externalizing and internalizing symptoms in ABCD youth across four annual assessments, estimating parallel growth parameters and cross-domain correlations to reveal comorbidity patterns and shared developmental processes.

When to Use:

Applicable when you want to model parallel growth processes (e.g., anxiety and suppression) simultaneously to see how trajectories are linked.

Key Advantage:

Captures covariance between multiple latent intercepts/slopes, revealing whether growth in one domain correlates with growth in another.

What You'll Learn:

How to specify multivariate LGCMs, interpret covariance among latent factors, and visualize coupled trajectories.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Key Parameters

vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values

Additional Resources

For more details on using NBDCtools:

NBDCtools Getting Started Guide - Complete package overview
Joining Data - Advanced data merging strategies
Filtering Events - Selecting specific assessment waves
Data Transformations - Preprocessing and cleaning

Data Preparation

NBDCtools Setup and Data Loading

29 lines

### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(gtsummary)    # Creating publication-quality tables
library(lavaan)       # Structural Equation Modeling in R
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables
library(patchwork)    # For combining ggplots

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
  "ab_g_dyn__design_site",
  "ab_g_stc__design_id__fam",
  "mh_p_cbcl__synd__ext_tscore",
  "mh_p_cbcl__synd__int_tscore"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Show all 29 linesShow less

Data Transformation

31 lines

### Clean and transform variables for analysis
df_long <- abcd_data %>%
  # Filter to baseline through Year 3
  # Note: Both "ses-00" and "ses-00A" are included as baseline to capture all available baseline data
  filter(session_id %in% c("ses-00", "ses-00A", "ses-01A", "ses-02A", "ses-03A")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(
      session_id,
      levels = c("ses-00", "ses-00A", "ses-01A", "ses-02A", "ses-03A"),
      labels = c("Baseline", "Baseline", "Year_1", "Year_2", "Year_3")
    )
  ) %>%
  rename(
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    externalizing = mh_p_cbcl__synd__ext_tscore,
    internalizing = mh_p_cbcl__synd__int_tscore
  ) %>%
  select(participant_id, session_id, site, family_id, externalizing, internalizing) %>%
  droplevels() %>%
  drop_na(externalizing, internalizing)                 # Remove rows with missing outcomes

# Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = c(externalizing, internalizing),
    names_sep = "_"
  ) %>%
  drop_na(starts_with("externalizing_"), starts_with("internalizing_"))  # Require complete data

Show all 31 linesShow less

Descriptive Statistics

28 lines

### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, externalizing, internalizing) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      externalizing ~ "Externalizing",
      internalizing ~ "Internalizing"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
  gt::gtsave(descriptives_table,
             filename = "descriptives_table.html")

### Print the table
descriptives_table

Show all 28 linesShow less

Characteristic	Assessment Wave
Characteristic	Baseline N = 11861¹	Year_1 N = 11201¹	Year_2 N = 10897¹	Year_3 N = 10196¹
Externalizing	46 (10)	45 (10)	44 (10)	44 (10)
Internalizing	48 (11)	49 (11)	48 (11)	48 (11)
¹ Mean (SD)

Statistical Analysis

Define and Fit Multivariate LGCM

34 lines

# Define multivariate growth model
model <- "
  # Growth model for externalizing
  i_var1 =~ 1*externalizing_Baseline + 1*externalizing_Year_1 + 1*externalizing_Year_2 + 1*externalizing_Year_3
  s_var1 =~ 0*externalizing_Baseline + 1*externalizing_Year_1 + 2*externalizing_Year_2 + 3*externalizing_Year_3

  # Growth model for internalizing
  i_var2 =~ 1*internalizing_Baseline + 1*internalizing_Year_1 + 1*internalizing_Year_2 + 1*internalizing_Year_3
  s_var2 =~ 0*internalizing_Baseline + 1*internalizing_Year_1 + 2*internalizing_Year_2 + 3*internalizing_Year_3

  # Estimate latent means
  i_var1 ~ 1
  s_var1 ~ 1
  i_var2 ~ 1
  s_var2 ~ 1

  # Fix observed variable intercepts to 0 for identification
  # This ensures the latent growth factors carry all the mean structure
  externalizing_Baseline ~ 0*1
  externalizing_Year_1   ~ 0*1
  externalizing_Year_2   ~ 0*1
  externalizing_Year_3   ~ 0*1

  internalizing_Baseline ~ 0*1
  internalizing_Year_1   ~ 0*1
  internalizing_Year_2   ~ 0*1
  internalizing_Year_3   ~ 0*1
"

# Fit the model using ML for handling missing data
fit <- sem(model, data = df_wide, missing = "ml")

# Display summary with fit measures
summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)

Show all 34 linesShow less

Format Model Summary Table

17 lines

# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Multivariate LGCM Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)

Show all 17 linesShow less

Format Model Fit Indices Table

21 lines

# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)

Show all 21 linesShow less

term	op	estimate	std.error	statistic	p.value	std.lv	std.all
Multivariate LGCM Results
i_var1 =~ externalizing_Baseline	=~	1.000	0.000	NA	NA	8.9650241	0.8794292
i_var1 =~ externalizing_Year_1	=~	1.000	0.000	NA	NA	8.9650241	0.9054640
i_var1 =~ externalizing_Year_2	=~	1.000	0.000	NA	NA	8.9650241	0.9276253
i_var1 =~ externalizing_Year_3	=~	1.000	0.000	NA	NA	8.9650241	0.9362185
s_var1 =~ externalizing_Baseline	=~	0.000	0.000	NA	NA	0.0000000	0.0000000
s_var1 =~ externalizing_Year_1	=~	1.000	0.000	NA	NA	1.7633886	0.1781016
s_var1 =~ externalizing_Year_2	=~	2.000	0.000	NA	NA	3.5267771	0.3649212
s_var1 =~ externalizing_Year_3	=~	3.000	0.000	NA	NA	5.2901657	0.5524526
i_var2 =~ internalizing_Baseline	=~	1.000	0.000	NA	NA	8.9399688	0.8485679
i_var2 =~ internalizing_Year_1	=~	1.000	0.000	NA	NA	8.9399688	0.8546363
i_var2 =~ internalizing_Year_2	=~	1.000	0.000	NA	NA	8.9399688	0.8623244
i_var2 =~ internalizing_Year_3	=~	1.000	0.000	NA	NA	8.9399688	0.8430902
s_var2 =~ internalizing_Baseline	=~	0.000	0.000	NA	NA	0.0000000	0.0000000
s_var2 =~ internalizing_Year_1	=~	1.000	0.000	NA	NA	1.9703869	0.1883635
s_var2 =~ internalizing_Year_2	=~	2.000	0.000	NA	NA	3.9407738	0.3801160
s_var2 =~ internalizing_Year_3	=~	3.000	0.000	NA	NA	5.9111608	0.5574563
i_var1 ~1	~1	45.375	0.100	452.269	0.000	5.0613584	5.0613584
s_var1 ~1	~1	−0.416	0.028	−14.768	0.000	-0.2361454	-0.2361454
i_var2 ~1	~1	48.432	0.103	470.161	0.000	5.4174154	5.4174154
s_var2 ~1	~1	−0.274	0.032	−8.452	0.000	-0.1392032	-0.1392032
externalizing_Baseline ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
externalizing_Year_1 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
externalizing_Year_2 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
externalizing_Year_3 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
internalizing_Baseline ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
internalizing_Year_1 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
internalizing_Year_2 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
internalizing_Year_3 ~1	~1	0.000	0.000	NA	NA	0.0000000	0.0000000
externalizing_Baseline ~~ externalizing_Baseline	~~	23.549	0.672	35.034	0.000	23.5488239	0.2266043
externalizing_Year_1 ~~ externalizing_Year_1	~~	27.236	0.505	53.924	0.000	27.2363190	0.2778356
externalizing_Year_2 ~~ externalizing_Year_2	~~	25.967	0.481	54.020	0.000	25.9668704	0.2780109
externalizing_Year_3 ~~ externalizing_Year_3	~~	21.400	0.619	34.549	0.000	21.3996121	0.2333766
internalizing_Baseline ~~ internalizing_Baseline	~~	31.071	0.860	36.123	0.000	31.0707682	0.2799324
internalizing_Year_1 ~~ internalizing_Year_1	~~	36.746	0.668	54.977	0.000	36.7456494	0.3358123
internalizing_Year_2 ~~ internalizing_Year_2	~~	34.284	0.637	53.853	0.000	34.2838518	0.3189767
internalizing_Year_3 ~~ internalizing_Year_3	~~	30.960	0.849	36.451	0.000	30.9596933	0.2753422
i_var1 ~~ i_var1	~~	80.372	1.436	55.963	0.000	1.0000000	1.0000000
s_var1 ~~ s_var1	~~	3.110	0.146	21.267	0.000	1.0000000	1.0000000
i_var2 ~~ i_var2	~~	79.923	1.537	51.992	0.000	1.0000000	1.0000000
s_var2 ~~ s_var2	~~	3.882	0.193	20.114	0.000	1.0000000	1.0000000
i_var1 ~~ s_var1	~~	−6.344	0.346	−18.317	0.000	-0.4012688	-0.4012688
i_var1 ~~ i_var2	~~	58.643	1.178	49.789	0.000	0.7316977	0.7316977
i_var1 ~~ s_var2	~~	−6.484	0.327	−19.854	0.000	-0.3670378	-0.3670378
s_var1 ~~ i_var2	~~	−7.318	0.295	−24.829	0.000	-0.4642100	-0.4642100
s_var1 ~~ s_var2	~~	4.321	0.100	43.222	0.000	1.2436066	1.2436066
i_var2 ~~ s_var2	~~	−5.564	0.417	−13.344	0.000	-0.3158614	-0.3158614

Fit Measure	Value
Model Fit Indices
chisq	4,421.337
df	22.000
pvalue	0.000
cfi	0.921
tli	0.899
rmsea	0.144
srmr	0.030
aic	527,274.153
bic	527,432.047

Interpretation

Externalizing symptoms averaged 45.4 at Baseline and declined by 0.416 points per year (both p < .001); internalizing symptoms started at 48.4 and fell by 0.274 points annually. Thus, both domains improved over the study window. Intercept variances of roughly 80 and slope variances of 3–4 confirmed large between-person differences, so the population mean hides substantial heterogeneity.Model fit was mixed—SRMR = 0.030 was excellent, whereas CFI/TLI (0.921/0.899) and RMSEA (0.144) suggested that additional cross-domain links might further reduce misfit. Even so, the estimated covariances captured the core comorbidity pattern: youth who started high on externalizing also tended to start high on internalizing (covariance = 58.6, p < .001), and declines in one domain generally coincided with declines in the other (slope covariance = 4.32, p < .001). Negative intercept–slope covariances within each domain imply diminishing returns—youth with the highest baseline symptoms improved, but at a slower rate than peers with milder presentations.

Visualization

38 lines

### Calculate means for each time point for externalizing and internalizing
mean_externalizing <- df_wide %>%
    summarise(across(starts_with("externalizing"), mean, na.rm = TRUE))

mean_internalizing <- df_wide %>%
    summarise(across(starts_with("internalizing"), mean, na.rm = TRUE))

### Reshape the data for plotting
mean_externalizing_long <- pivot_longer(mean_externalizing, cols = everything(), names_to = "Time", values_to = "Mean_externalizing")
mean_internalizing_long <- pivot_longer(mean_internalizing, cols = everything(), names_to = "Time", values_to = "Mean_internalizing")

### Plot the mean trajectories for externalizing
externalizing_plot <- ggplot(mean_externalizing_long, aes(x = Time, y = Mean_externalizing, group = 1)) +
    geom_line(color = "blue", linewidth = 1.2) +
    geom_point(color = "blue") +
    labs(title = "Mean Growth Trajectory for externalizing", y = "Mean externalizing", x = "Time Point") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

### Plot the mean trajectories for internalizing
internalizing_plot <- ggplot(mean_internalizing_long, aes(x = Time, y = Mean_internalizing, group = 1)) +
    geom_line(color = "red", linewidth = 1.2) +
    geom_point(color = "red") +
    labs(title = "Mean Growth Trajectory for internalizing", y = "Mean internalizing", x = "Time Point") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

### Combine the plots side by side using patchwork
combined_plot <- externalizing_plot + internalizing_plot
print(combined_plot)

# Save as a high-resolution PNG file
ggsave(filename = "visualization.png",
       plot = combined_plot,
       width = 10,  # Specify width in inches (or units)
       height = 5, # Specify height in inches (or units)
       units = "in", # Specify units (e.g., "in", "cm", "mm")
       dpi = 300) # Specify resolution (e.g., 300 for good quality)

Show all 38 linesShow less

Visualization Notes

This visualization displays the mean trajectories for externalizing (blue) and internalizing (red) behaviors across four annual assessments. Both plots show the average change patterns at the group level, highlighting the overall developmental trends.The externalizing trajectory (left panel) shows a modest decline from baseline through Year 3, with symptoms decreasing approximately 0.4 points per year. The internalizing trajectory (right panel) exhibits a similar but slightly smaller downward trend, declining approximately 0.3 points annually. Both patterns demonstrate symptom improvement over time at the population level.The parallel visualization format facilitates comparison of trajectory shapes and magnitude differences between the two behavioral domains. While both show declining trends, externalizing symptoms begin at lower baseline levels (around 45) compared to internalizing symptoms (around 48), and both converge toward similar levels by Year 3. This coordinated decline across domains aligns with the positive slope covariance identified in the model, suggesting shared developmental processes or common influences driving improvement in both behavioral domains.

Discussion

Key Findings

The multivariate growth model captured parallel declines in both externalizing (slope = −0.416, SE = 0.028, p < .001) and internalizing behaviors (slope = −0.274, SE = 0.032, p < .001). Intercept and slope variances were significant for each domain, confirming that youth entered adolescence with very different symptom levels and moved along heterogeneous pathways even when the overall trend pointed downward.

Implications

Model diagnostics were mixed. The SRMR of 0.030 suggested strong absolute fit, whereas the CFI (0.921) and TLI (0.899) fell shy of the 0.95 benchmark and the RMSEA (0.144) exceeded common cutoffs, implying that additional cross-loadings or time-specific covariances might still be needed to fully capture the observed structure.

Cross-domain associations were pronounced. Positive intercept (58.643, p < .001) and slope (4.321, p < .001) covariances indicated that youth with elevated baseline externalizing symptoms also tended to start high on internalizing problems and that improvements occurred in tandem. Negative intercept–slope covariances within each domain suggested diminishing returns: adolescents starting with the greatest difficulties exhibited slower recovery. Together, these findings highlight meaningful comorbidity and underscore the value of parallel-process models for charting cascading behavioral change.

Additional Resources

lavaan Multivariate Growth Models

DOCS

Official lavaan documentation on parallel process models with multiple outcomes, demonstrating how to specify correlated intercepts and slopes across different constructs.

Visit Resource

Parallel Process LGCM Tutorial

VIGNETTE

Step-by-step quantitative methods tutorial on modeling correlated growth trajectories between two or more outcomes, including interpretation of cross-domain associations.

Visit Resource

Multivariate Growth Models in SEM

PAPER

Foundational methodology paper on coupled change processes in multivariate longitudinal data, covering theoretical frameworks and practical applications (Bollen & Curran, 2004). Note: access may require institutional or paid subscription.

Visit Resource

lavaan Model Syntax Examples

TOOL

Collection of lavaan syntax examples for complex structural equation models including multivariate growth curves, with code snippets and interpretation guidance.

Visit Resource