Overview

Latent Growth Curve Modeling (LGCM) analyzes longitudinal change by estimating growth trajectories as latent factors while distinguishing systematic development from measurement error. Using intercept and slope parameters, LGCM captures both population-average patterns and individual differences in developmental processes, providing more accurate estimates than traditional repeated measures approaches. This tutorial applies LGCM to examine emotional suppression in ABCD youth across four annual assessments, estimating the average trajectory and individual variation in initial levels and rates of change.

When to Use:

Ideal when you have repeated ABCD measures and want to model the average growth trajectory plus individual deviations.

Key Advantage:

LGCM provides latent intercept and slope factors, so you can quantify both initial status and change over time with measurement error accounted for.

What You'll Learn:

How to specify a basic LGCM in lavaan, interpret intercept/slope estimates, and assess overall model fit and residual structure.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Key Parameters

vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values

Additional Resources

For more details on using NBDCtools:

NBDCtools Getting Started Guide - Complete package overview
Joining Data - Advanced data merging strategies
Filtering Events - Selecting specific assessment waves
Data Transformations - Preprocessing and cleaning

Data Preparation

NBDCtools Setup and Data Loading

28 lines

### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(arrow)        # For reading Parquet files
library(gtsummary)    # Creating publication-quality tables
library(lavaan)       # Structural Equation Modeling in R
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Show all 28 linesShow less

Data Transformation

27 lines

### Create a long-form dataset with relevant columns
df_long <- abcd_data %>%
  select(participant_id, session_id, ab_g_dyn__design_site, ab_g_stc__design_id__fam, mh_y_erq__suppr_mean) %>%
  # Filter to Years 3-6 annual assessments using NBDCtools
  filter_events_abcd(conditions = c("annual", ">=3", "<=6")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(session_id,
                        levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                        labels = c("Year_3", "Year_4", "Year_5", "Year_6"))  # Relabel sessions for clarity
  ) %>%
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    suppression = mh_y_erq__suppr_mean
  ) %>%
  droplevels() %>%                                     # Drop unused factor levels
  drop_na(suppression)                                 # Remove rows with missing outcome data

### Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  drop_na(starts_with("Suppression_"))  # Require complete data across all time points

Show all 27 linesShow less

Descriptive Statistics

26 lines

### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table

Show all 26 linesShow less

Characteristic	Assessment Wave
Characteristic	Year_3 N = 10318¹	Year_4 N = 9586¹	Year_5 N = 8784¹	Year_6 N = 5001¹
Suppression	3.10 (0.86)	3.35 (0.86)	3.35 (0.87)	3.42 (0.88)
¹ Mean (SD)

Statistical Analysis

Define and Fit Basic LGCM

21 lines

# Define model specification
model <- "
  i =~ 1*Suppression_Year_3 + 1*Suppression_Year_4 + 1*Suppression_Year_5 + 1*Suppression_Year_6
  s =~ 0*Suppression_Year_3 + 1*Suppression_Year_4 + 2*Suppression_Year_5 + 3*Suppression_Year_6

  # Intercept and slope variances
  i ~~ i
  s ~~ s

  # Residual variances for each observed variable
  Suppression_Year_3 ~~ var_baseline*Suppression_Year_3
  Suppression_Year_4 ~~ var_year1*Suppression_Year_4
  Suppression_Year_5 ~~ var_year2*Suppression_Year_5
  Suppression_Year_6 ~~ var_year3*Suppression_Year_6
"

# Fit the growth model
fit <- growth(model, data = df_wide, missing = "ml")

# Display model summary
summary(fit)

Show all 21 linesShow less

Format Model Summary Table

17 lines

# Extract model summary
model_summary <- summary(fit)

model_summary

# Convert lavaan output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results") %>%
  fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3)

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)

Show all 17 linesShow less

Format Model Fit Indices Table

21 lines

# Extract and save model fit indices
fit_indices <- fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)

Show all 21 linesShow less

term	op	label	estimate	std.error	statistic	p.value	std.lv	std.all
Latent Growth Curve Model Results
i =~ Suppression_Year_3	=~		1.000	0.000	NA	NA	0.5674924	0.6503570
i =~ Suppression_Year_4	=~		1.000	0.000	NA	NA	0.5674924	0.6715766
i =~ Suppression_Year_5	=~		1.000	0.000	NA	NA	0.5674924	0.6663175
i =~ Suppression_Year_6	=~		1.000	0.000	NA	NA	0.5674924	0.6371410
s =~ Suppression_Year_3	=~		0.000	0.000	NA	NA	0.0000000	0.0000000
s =~ Suppression_Year_4	=~		1.000	0.000	NA	NA	0.2140114	0.2532634
s =~ Suppression_Year_5	=~		2.000	0.000	NA	NA	0.4280228	0.5025601
s =~ Suppression_Year_6	=~		3.000	0.000	NA	NA	0.6420341	0.7208313
i ~~ i	~~		0.322	0.015	21.066	0.000	1.0000000	1.0000000
s ~~ s	~~		0.046	0.003	13.426	0.000	1.0000000	1.0000000
Suppression_Year_3 ~~ Suppression_Year_3	~~	var_baseline	0.439	0.016	26.880	0.000	0.4393588	0.5770358
Suppression_Year_4 ~~ Suppression_Year_4	~~	var_year1	0.424	0.012	36.743	0.000	0.4239019	0.5936580
Suppression_Year_5 ~~ Suppression_Year_5	~~	var_year2	0.376	0.011	35.180	0.000	0.3755151	0.5176899
Suppression_Year_6 ~~ Suppression_Year_6	~~	var_year3	0.292	0.015	20.147	0.000	0.2921647	0.3682805
i ~~ s	~~		−0.039	0.006	−6.492	0.000	-0.3198840	-0.3198840
Suppression_Year_3 ~1	~1		0.000	0.000	NA	NA	0.0000000	0.0000000
Suppression_Year_4 ~1	~1		0.000	0.000	NA	NA	0.0000000	0.0000000
Suppression_Year_5 ~1	~1		0.000	0.000	NA	NA	0.0000000	0.0000000
Suppression_Year_6 ~1	~1		0.000	0.000	NA	NA	0.0000000	0.0000000
i ~1	~1		3.109	0.012	253.293	0.000	5.4787306	5.4787306
s ~1	~1		0.110	0.005	20.502	0.000	0.5130239	0.5130239

Fit Measure	Value
Model Fit Indices
chisq	180.816
df	5.000
pvalue	0.000
cfi	0.949
tli	0.938
rmsea	0.092
srmr	0.045
aic	39,498.889
bic	39,555.966

Interpretation

The LGCM fit was generally strong (CFI = 0.949, TLI = 0.938, SRMR = 0.045), with only the RMSEA (0.092) hinting at modest residual misfit. Average suppression at Year 3 was 3.109 (SE = 0.012, p < .001) and rose by 0.110 points per year (SE = 0.005, p < .001), indicating a small but reliable increase. Intercept and slope variances (0.322 and 0.046, both p < .001) confirmed that adolescents differed markedly in both starting levels and rates of change. The negative intercept–slope covariance (−0.039, p < .001) implies that youth who began with high suppression tended to grow more slowly, whereas those starting lower closed the gap. Residual variances declined from 0.439 at Year 3 to 0.292 by Year 6, suggesting that measurements became more stable across successive assessments. Overall, the model depicts a cohort-wide rise in suppression layered on top of substantial between-person heterogeneity.

Visualization

22 lines

### Select a subset of participants
n_sample <- min(150, length(unique(df_long$participant_id)))
selected_ids <- sample(unique(df_long$participant_id), n_sample)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

### Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Emotional Suppression Trajectories Over Time",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

      ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)

Show all 22 linesShow less

Visualization Notes

Each gray line shows a participant’s suppression trajectory across the four assessments, while blue points mark the observed scores and the red line traces the sample-wide mean. The upward tilt of the red line signals a cohort-level increase in suppression, yet the fan of gray lines makes it clear that individuals follow very different paths—some rise steeply, others flatten or dip. Because the observed points hug their respective lines, the plot also reassures us that the smoothing reflects the actual data rather than an artifact of model assumptions. In short, the figure simultaneously communicates the population trend and the heterogeneity that motivates a latent growth curve approach.

Discussion

The analysis reveals heterogeneous suppression trajectories, with the overall trend indicating increasing suppression over time while individual trajectories varied substantially. Some participants exhibited slower or faster growth patterns, demonstrating the value of modeling random slope variability. The model captured significant individual differences in suppression trajectories by allowing for random slopes, improving fit compared to a model with only fixed effects.

The inclusion of both random intercepts and slopes provided a more flexible framework for understanding variability in initial suppression levels and growth rates across participants. The latent growth curve model (LGCM) enables a more detailed examination of longitudinal trends by modeling both baseline differences (intercepts) and individual variability in rates of change (slopes), offering deeper insights into developmental patterns over time.

Additional Resources

lavaan Growth Curve Tutorial

DOCS

Official lavaan documentation for latent growth curve modeling basics, covering model specification, parameter estimation, and result interpretation for unconditional growth models.

Visit Resource

Structural Equation Modeling in lavaan

VIGNETTE

Comprehensive vignette covering growth models within the structural equation modeling framework, including detailed examples of latent growth curve specifications and output interpretation.

Visit Resource

Longitudinal Data Analysis by Singer & Willett

BOOK

Foundational textbook on growth curve modeling. Chapters 3-4 provide thorough coverage of unconditional growth models, including interpretation of intercepts, slopes, and random effects for longitudinal data. Note: access may require institutional or paid subscription.

Visit Resource

semPlot Package for Model Visualization

TOOL

R package for creating publication-quality path diagrams of structural equation models and latent growth curve models, with extensive customization options for visualization.

Visit Resource