Longitudinal.dev

Latent Growth Curve Models (LGCM)

Multiple Groups (LGCM)

Compare latent growth trajectories between groups using multigroup LGCM, testing memory development differences while accounting for related ABCD participants.

LGCM โ€บ Continuous
lavaanUpdated 2025-11-05abcd-studylatent-growth-curve-modelgroup-comparison
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโ€”use the "Suggest changes" button to help us improve!

Overview

Multigroup Latent Growth Curve Modeling (MG-LGCM) tests whether growth patterns differ systematically across groups by estimating separate intercept and slope parameters for each group while allowing measurement and structural equivalence testing. This approach addresses key questions about developmental heterogeneity: Do groups show equivalent developmental patterns? Are baseline differences maintained over time? Do groups change at different rates? This tutorial examines sex differences in memory trajectories across four assessments in ABCD youth, testing increasingly constrained models to determine whether boys and girls exhibit equivalent or distinct developmental patterns.

When to Use:
When you need to compare growth trajectories between groups (e.g., sex, site) across ABCD waves.
Key Advantage:
Multiple-group LGCM estimates intercepts/slopes per group and tests equality constraints to detect group differences.
What You'll Learn:
How to specify multiple-group LGCMs in lavaan, compare nested models, and interpret group-specific growth factors.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional NBDCtools Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R33 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Data manipulation and visualization
library(arrow)        # For reading Parquet files
library(gtsummary)    # Publication-ready descriptive tables
library(gt)           # Table formatting
library(lavaan)       # Structural equation modeling
library(semPlot)      # Path diagram visualization

# Set random seed for reproducible family member selection
set.seed(123)

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "ab_g_stc__cohort_sex",
    "ab_g_dyn__visit_age",
    "nc_y_nihtb__picsq__agecor_score"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R28 lines
# Create longitudinal dataset with cleaned variables
df_long <- abcd_data %>%
  # Select memory assessment waves
  filter(session_id %in% c("ses-00A", "ses-02A", "ses-04A", "ses-06A")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    # Relabel session IDs for clarity
    session_id = factor(session_id,
                        levels = c("ses-00A", "ses-02A", "ses-04A", "ses-06A"),
                        labels = c("Year_0", "Year_2", "Year_4", "Year_6")),
    # Create sex grouping variable with meaningful labels
    sex = factor(case_when(
      ab_g_stc__cohort_sex == 1 ~ "Male",
      ab_g_stc__cohort_sex == 2 ~ "Female",
      TRUE ~ NA_character_
    ))
  ) %>%
  # Rename for clarity
  rename(
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    age = ab_g_dyn__visit_age,
    memory = nc_y_nihtb__picsq__agecor_score
  ) %>%
  # Remove invalid memory scores and missing sex data
  filter(memory > 1, !is.na(sex)) %>%
  select(participant_id, session_id, site, family_id, age, sex, memory) %>%
  droplevels()
Reshape to Wide Format
R13 lines
# Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = c(memory, age, sex, family_id),
    names_sep = "_"
  ) %>%
  # Clean column names by removing prefix
  rename_with(~ str_replace_all(., "Year_", ""), everything()) %>%
  # Use sex from baseline for grouping
  mutate(sex_group = sex_0) %>%
  # Remove cases with missing memory data
  drop_na(starts_with("memory_"))
Descriptive Statistics by Group
R25 lines
descriptives_table <- df_long %>%
  select(session_id, memory) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      memory ~ "Memory"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "**{level}**<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Year_0
N = 11709
1
Year_2
N = 10396
1
Year_4
N = 9258
1
Year_6
N = 4876
1
Memory 101 (16) 105 (17) 109 (20) 107 (20)
1 Mean (SD)

Statistical Analysis

Interpretation
Measurement Invariance Testing

This tutorial employs a multigroup latent growth curve modeling approach to test measurement invariance across sex groups. Measurement invariance testing evaluates whether the same construct is being measured equivalently across groups - a prerequisite for valid group comparisons. We fit a series of increasingly constrained models (M1-M4) that test different levels of equivalence: M1 tests full equality across groups, M2 relaxes latent mean constraints, M3 further relaxes variance/covariance constraints, and M4 (the least constrained) only constrains factor loadings to be equal across groups. By comparing model fit across these nested models, we can determine the appropriate level of invariance and whether observed group differences in trajectories reflect genuine developmental differences rather than measurement artifacts.

Define and Fit Multigroup Models
R41 lines
# Define Latent Growth Model (LGCM)
lgcm_model <- "
  # Latent Growth Model
  i =~ 1*memory_0 + 1*memory_2 + 1*memory_4 + 1*memory_6
  s =~ 0*memory_0 + 1*memory_2 + 2*memory_4 + 3*memory_6
"

# Define function to fit SEM model with different constraints
fit_lgcm_model <- function(model, group_constraints = NULL) {
  sem(
    model,
    data = df_wide,
    estimator = "ML",
    cluster = "family_id_0",
    se = "robust",
    missing = "fiml",
    group = "sex_0",  # Separating groups by sex
    group.equal = group_constraints
  )
}

# Fit models with increasing constraints
model_constraints <- list(
  M1 = c("loadings", "means", "lv.variances", "lv.covariances", "residuals"),
  M2 = c("loadings", "lv.variances", "lv.covariances", "residuals"),
  M3 = c("loadings", "residuals"),
  M4 = c("loadings")  # Least constrained model
)

# Fit models and store results in a list
fit <- lapply(model_constraints, function(constraints) fit_lgcm_model(lgcm_model, constraints))

# Assign meaningful names to models
names(fit) <- names(model_constraints)

# Compare models using ANOVA
anova_results <- anova(fit$M2, fit$M3, fit$M4)
print(anova_results)

# Print summary of the most constrained model (M4)
summary(fit$M4, fit.measures = TRUE)
Format Model Summary Table
R20 lines
# Extract model summary for M4
model_summary <- summary(fit$M4, fit.measures = TRUE)

model_summary

# Convert lavaan output to a tidy dataframe and then to gt table
model_summary_table <- broom::tidy(fit$M4) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results") %>%
  fmt_number(
    columns = c(estimate, std.error, statistic, p.value),
    decimals = 3
  )

# Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R21 lines
# Extract and save model fit indices for M4 (most constrained model)
fit_indices <- fitMeasures(fit$M4, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr", "aic", "bic"))

fit_indices_table <- data.frame(
  Metric = names(fit_indices),
  Value = as.numeric(fit_indices)
) %>%
  gt() %>%
  tab_header(title = "Model Fit Indices (M4)") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

# Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Latent Growth Curve Model Results
term op block group estimate std.error statistic p.value std.lv std.all
i =~ memory_0 =~ 1 1 1.000 0.000 NA NA 10.026763956 0.635582076
i =~ memory_2 =~ 1 1 1.000 0.000 NA NA 10.026763956 0.621989539
i =~ memory_4 =~ 1 1 1.000 0.000 NA NA 10.026763956 0.499216528
i =~ memory_6 =~ 1 1 1.000 0.000 NA NA 10.026763956 0.513752217
s =~ memory_0 =~ 1 1 0.000 0.000 NA NA 0.000000000 0.000000000
s =~ memory_2 =~ 1 1 1.000 0.000 NA NA 2.253606572 0.139797817
s =~ memory_4 =~ 1 1 2.000 0.000 NA NA 4.507213144 0.224406928
s =~ memory_6 =~ 1 1 3.000 0.000 NA NA 6.760819716 0.346411478
memory_0 ~~ memory_0 ~~ 1 1 148.337 8.484 17.485 0.000 148.337300881 0.596035425
memory_2 ~~ memory_2 ~~ 1 1 144.551 6.111 23.654 0.000 144.551393352 0.556245948
memory_4 ~~ memory_4 ~~ 1 1 263.149 9.295 28.311 0.000 263.149388374 0.652316995
memory_6 ~~ memory_6 ~~ 1 1 205.548 11.658 17.632 0.000 205.547814917 0.539633249
i ~~ i ~~ 1 1 100.536 7.558 13.301 0.000 1.000000000 1.000000000
s ~~ s ~~ 1 1 5.079 1.906 2.665 0.008 1.000000000 1.000000000
i ~~ s ~~ 1 1 4.852 2.814 1.724 0.085 0.214712148 0.214712148
memory_0 ~1 ~1 1 1 100.909 0.359 281.455 0.000 100.908964880 6.396473444
memory_2 ~1 ~1 1 1 105.775 0.352 300.488 0.000 105.774953789 6.561530220
memory_4 ~1 ~1 1 1 107.927 0.448 240.866 0.000 107.926987061 5.373511932
memory_6 ~1 ~1 1 1 105.447 0.440 239.824 0.000 105.447319778 5.402919086
i ~1 ~1 1 1 0.000 0.000 NA NA 0.000000000 0.000000000
s ~1 ~1 1 1 0.000 0.000 NA NA 0.000000000 0.000000000
i =~ memory_0 =~ 2 2 1.000 0.000 NA NA 10.817546233 0.670975854
i =~ memory_2 =~ 2 2 1.000 0.000 NA NA 10.817546233 0.658240433
i =~ memory_4 =~ 2 2 1.000 0.000 NA NA 10.817546233 0.536173315
i =~ memory_6 =~ 2 2 1.000 0.000 NA NA 10.817546233 0.567803635
s =~ memory_0 =~ 2 2 0.000 0.000 NA NA 0.000000000 0.000000000
s =~ memory_2 =~ 2 2 1.000 0.000 NA NA 2.641275466 0.160719841
s =~ memory_4 =~ 2 2 2.000 0.000 NA NA 5.282550933 0.261830436
s =~ memory_6 =~ 2 2 3.000 0.000 NA NA 7.923826399 0.415914786
memory_0 ~~ memory_0 ~~ 2 2 142.903 8.786 16.265 0.000 142.903110326 0.549791403
memory_2 ~~ memory_2 ~~ 2 2 146.358 6.446 22.706 0.000 146.358495016 0.541912946
memory_4 ~~ memory_4 ~~ 2 2 262.678 9.538 27.541 0.000 262.678028541 0.645322220
memory_6 ~~ memory_6 ~~ 2 2 183.986 11.100 16.575 0.000 183.985594015 0.506900402
i ~~ i ~~ 2 2 117.019 8.051 14.534 0.000 1.000000000 1.000000000
s ~~ s ~~ 2 2 6.976 1.855 3.761 0.000 1.000000000 1.000000000
i ~~ s ~~ 2 2 โˆ’0.138 2.807 โˆ’0.049 0.961 -0.004840997 -0.004840997
memory_0 ~1 ~1 2 2 103.660 0.379 273.726 0.000 103.660211800 6.429692801
memory_2 ~1 ~1 2 2 108.128 0.374 289.187 0.000 108.128088754 6.579521678
memory_4 ~1 ~1 2 2 110.266 0.472 233.432 0.000 110.265758951 5.465339018
memory_6 ~1 ~1 2 2 109.694 0.450 243.760 0.000 109.693898134 5.757737729
i ~1 ~1 2 2 0.000 0.000 NA NA 0.000000000 0.000000000
s ~1 ~1 2 2 0.000 0.000 NA NA 0.000000000 0.000000000
Model Fit Indices (M4)
Fit Measure Value
chisq 30.407
df 6.000
pvalue 0.000
cfi 0.992
tli 0.984
rmsea 0.044
srmr 0.021
aic 139,578.247
bic 139,717.510
Interpretation
Model Comparison Strategy

The analysis compares four nested models with progressively relaxed constraints to test measurement invariance:M1 (Full Equality): All parameters constrained equal across sex groups - tests whether males and females show identical growth patterns in all respectsM2 (Equal Variances/Covariances): Relaxes latent mean constraints, allowing groups to differ in average intercepts and slopes while maintaining equal variances and covariancesM3 (Equal Loadings/Residuals): Further relaxes variance/covariance constraints, permitting group differences in individual variability around mean trajectoriesM4 (Metric Invariance): The least constrained model, requiring only equal factor loadings across groups - establishes that time points are measured on the same scale for both groupsModel comparison uses chi-square difference tests to evaluate whether relaxing constraints significantly improves fit. Non-significant differences suggest the more constrained model is adequate, supporting equivalent measurement and structural parameters across groups.# Visualizing the latent growth curve model sem_diagram <- semPaths( fit$M1, # Your fitted LGCM model what = "path", # Display path diagram whatLabels = "par", # Show parameter estimates style = "lisrel", # A clean layout for SEM models nCharNodes = 0, # Avoid truncating variable names layout = "tree", # Hierarchical layout (better for growth models) residuals = FALSE, # Hide residuals for clarity curvePivot = TRUE, # Adjust curvature of paths intercepts = FALSE, # Hide intercepts for clarity edge.label.cex = 0.8, # Adjust label size sizeMan = 7, # Increase node size sizeLat = 10, # Increase latent variable size group.label = TRUE ) png( filename = "sem_diagram.png", width = 1200, height = 900, res = 150 )

Interpretation
Interpretation

The multigroup LGCM found meaningful between-person variability in both baseline memory and change. Intercept variances were 117.0 for females and 100.5 for males, while slope variances were 7.0 and 5.1 respectively, indicating that members of each group start from different levels and improve at different rates. Covariance patterns hinted at subtle sex differences: females showed virtually no association between initial status and change (โˆ’0.14, p = .96), whereas males displayed a positive but only marginally significant covariance (4.85, p = .085), suggesting that higher-baseline males may recover slightly faster. Despite these nuances, nested model comparisons (M2โ†’M3โ†’M4) yielded ฮ”ฯ‡ยฒ = 2.69, p = .443, so loosening equality constraints did not materially improve fit. The take-away is that memory growth curves are largely comparable across sex, with variance components doing most of the work to capture heterogeneity rather than group-specific fixed effects.

Visualization
R26 lines
# Select a subset of participants
set.seed(123)  # For reproducibility
selected_ids <- sample(unique(df_long$participant_id), 250)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

# Plot memory growth by sex group
visualization <- ggplot(df_long_selected, aes(x = session_id, y = memory, group = participant_id, color = sex)) +
    geom_line(alpha = 0.3, aes(color = sex)) +  # Individual trajectories
    geom_point(size = 1.5) +  # Observed data points
    geom_smooth(aes(group = sex, fill = sex), method = "lm", linewidth = 1.2, se = TRUE, alpha = 0.3) +
    facet_wrap(~sex) +  # Separate panels for each group
    labs(
        title = "Memory Growth Trajectories by Sex",
        x = "Time (Years from Baseline)",
        y = "Memory Score",
        color = "Sex Group",
        fill = "Sex Group"
    ) +
    theme_minimal() +
    theme(legend.position = "bottom")

      ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Visualization
Interpretation
Visualization Notes

The faceted plot shows individual memory trajectories (faint lines) and group-level trends (bold smoothers) for males and females side by side. Both panels slope upward, matching the modelโ€™s conclusion that memory improves over time, yet the spreads differ slightly: female trajectories fan out more, echoing the larger intercept and slope variances estimated for that group. Points at each assessment anchor the smooth lines to the observed data, so readers can judge how well the fitted curves represent reality. Because the means rise at similar rates in both panels, the figure visually supports the invariance tests showing no compelling sex differences in the fixed effects, while still highlighting the heterogeneity that motivates a multigroup LGCM.

Discussion

This analysis characterized substantial between-person variability in the estimated growth trajectory of memory. The observed trajectories ranged from rapid improvement to stability or declineโ€”which demonstrated that a simple average or fixed-effects approach would likely be insufficient. This heterogeneity highlighted the need to capture individual deviations from the group trend for accurate inference.

Incorporating random slopes allowed for the estimation of individual-specific rates of change over time, rather than constraining all subjects to a common growth rate. By modeling both random intercepts (to capture baseline heterogeneity) and random slopes, the model effectively accounted for individual differences in both starting memory level and subsequent growth. This significantly enhanced model fit relative to simpler alternatives, supporting the need for this additional structure.

Additional Resources

4

lavaan Multi-Group Analysis

DOCS

Official lavaan documentation on multi-group growth curve modeling, including how to specify group constraints and test measurement invariance across groups.

Visit Resource

Measurement Invariance in Growth Models

PAPER

Comprehensive methodology paper on testing invariance across groups in longitudinal structural equation models, covering configural, metric, and scalar invariance (Widaman et al., 2010).

Visit Resource

Growth Curve Modeling by Preacher et al.

BOOK

Textbook chapter on multi-group growth models with practical examples, demonstrating how to compare trajectories across demographic groups and test equality constraints.

Visit Resource

semTools for Invariance Testing

TOOL

R package that automates measurement invariance tests in lavaan, providing convenient functions for sequential constraint testing and model comparison.

Visit Resource