LGCM Tutorial
This tutorial walks through a complete LGCM analysis. By the end, you will have:
- Simulated longitudinal data with known parameters
- Visualized individual trajectories
- Fit and compared growth models
- Interpreted the results
Workflow Overview
βββββββββββββββββββββββββββββββββββ
β 1. Setup β
β Load packages, set seed β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 2. Simulate/Load Data β
β Wide format, check structure β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 3. Visualize β
β Spaghetti plot, look first β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 4. Fit Models β
β Baseline + growth models β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 5. Compare Models β
β LRT, AIC/BIC β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 6. Check Diagnostics β
β Fit indices, residuals β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β 7. Interpret Results β
β Parameters, effect sizes β
βββββββββββββββββββββββββββββββββββ
Step 1: Setup
# Load packages
library(tidyverse) # Data manipulation and plotting
library(lavaan) # SEM and growth models
library(MASS) # For mvrnorm (simulation)
# Set seed for reproducibility
set.seed(2024)
Step 2: Simulate Data
We'll create data with known population parameters so we can verify our model recovers them:
| Parameter | True Value |
|---|---|
| Intercept mean | 50 |
| Slope mean | 2 |
| Intercept variance | 100 (SD = 10) |
| Slope variance | 1 (SD = 1) |
| I-S covariance | -2 (r β -0.20) |
| Residual variance | 25 (SD = 5) |
# Sample size
n <- 400
# Latent factor covariance matrix
psi <- matrix(c(100, -2,
-2, 1), nrow = 2)
# Generate individual intercepts and slopes
factors <- mvrnorm(n = n,
mu = c(50, 2),
Sigma = psi)
# Generate observed data
data_wide <- tibble(id = 1:n) %>%
mutate(
int_i = factors[, 1],
slp_i = factors[, 2],
y1 = int_i + slp_i * 0 + rnorm(n, 0, 5),
y2 = int_i + slp_i * 1 + rnorm(n, 0, 5),
y3 = int_i + slp_i * 2 + rnorm(n, 0, 5),
y4 = int_i + slp_i * 3 + rnorm(n, 0, 5),
y5 = int_i + slp_i * 4 + rnorm(n, 0, 5)
) %>%
select(id, y1:y5)
# Check structure
head(data_wide)
β
Checkpoint: You should see a tibble with 6 rows and columns id, y1βy5. Values will differ due to random generation, but should be in the 30β70 range.
Step 3: Visualize
Always plot before modeling.
# Reshape for plotting
data_long <- data_wide %>%
pivot_longer(y1:y5, names_to = "wave", values_to = "y") %>%
mutate(time = as.numeric(gsub("y", "", wave)) - 1)
# Spaghetti plot
ggplot(data_long, aes(x = time, y = y, group = id)) +
geom_line(alpha = 0.15, color = "gray40") +
scale_x_continuous(breaks = 0:4, labels = paste("Wave", 1:5)) +
labs(x = "Time", y = "Score",
title = "Individual Growth Trajectories",
subtitle = "N = 400 participants, 5 waves") +
theme_minimal()
What to look for:
- General trend (up, down, flat?)
- Spread at baseline (intercept variance)
- Fan pattern (slope variance)
- Nonlinearity (curves vs. straight lines)
# Add mean trajectory
mean_traj <- data_long %>%
group_by(time) %>%
summarise(mean_y = mean(y))
ggplot(data_long, aes(x = time, y = y)) +
geom_line(aes(group = id), alpha = 0.1, color = "gray40") +
geom_line(data = mean_traj, aes(y = mean_y),
color = "steelblue", linewidth = 1.5) +
# Note: ggplot2 < 3.4 used 'size' instead of 'linewidth'
geom_point(data = mean_traj, aes(y = mean_y),
color = "steelblue", size = 3) +
scale_x_continuous(breaks = 0:4, labels = paste("Wave", 1:5)) +
labs(x = "Time", y = "Score",
title = "Individual Trajectories with Mean Overlay") +
theme_minimal()

Figure: Individual trajectories (gray) with mean trajectory overlay (blue). The upward trend shows positive average growth; the spread shows individual differences.
β Checkpoint: Your plot should show upward-trending lines with visible spread at baseline and a "fan" pattern over time.
Step 4: Fit Models
No-Growth Model (Baseline)
Fit a constrained linear model with no growthβslope mean and variance fixed to zero:
model_nogrowth <- '
intercept =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
slope =~ 0*y1 + 1*y2 + 2*y3 + 3*y4 + 4*y5
# Constrain slope to "no growth"
slope ~ 0*1 # mean(slope) = 0
slope ~~ 0*slope # var(slope) = 0
intercept ~~ 0*slope # cov(i, s) = 0
'
fit_nogrowth <- growth(model_nogrowth, data = data_wide,
missing = "fiml")
Note: This model is nested within the linear growth model, allowing a valid likelihood ratio test.
Linear Growth Model
Add the slope factor:
model_linear <- '
intercept =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
slope =~ 0*y1 + 1*y2 + 2*y3 + 3*y4 + 4*y5
'
fit_linear <- growth(model_linear, data = data_wide,
missing = "fiml")
Note: We specify missing = "fiml" explicitlyβgood practice even when data are complete. For real data with potential non-normality, add estimator = "MLR" for robust standard errors.
Linear Growth with Equal Residuals
Test whether residual variances can be constrained equal:
model_linear_eq <- '
intercept =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
slope =~ 0*y1 + 1*y2 + 2*y3 + 3*y4 + 4*y5
# Constrain residual variances to equality
y1 ~~ rv*y1
y2 ~~ rv*y2
y3 ~~ rv*y3
y4 ~~ rv*y4
y5 ~~ rv*y5
'
fit_linear_eq <- growth(model_linear_eq, data = data_wide,
missing = "fiml")
Step 5: Compare Models
Is there growth?
anova(fit_nogrowth, fit_linear)
β Confirm: ΞΟΒ² should be significant (p < .001). This means linear growth improves fit over a flat trajectory (no-growth model).
Are equal residual variances justified?
anova(fit_linear_eq, fit_linear)
Use anova(constrained, unconstrained) so the ΞΟΒ² tests the relaxation of constraints.
β Confirm: If p > .05, equality is not rejected; prefer the simpler constrained model.
Information criteria
data.frame(
Model = c("No-growth", "Linear", "Linear (equal resid)"),
AIC = c(AIC(fit_nogrowth), AIC(fit_linear), AIC(fit_linear_eq)),
BIC = c(BIC(fit_nogrowth), BIC(fit_linear), BIC(fit_linear_eq))
) %>%
mutate(across(where(is.numeric), \(x) round(x, 1)))
# Note: R < 4.1 use function(x) instead of \(x)
β Confirm: Lower AIC/BIC = better fit. Linear models should beat no-growth.
Step 6: Check Diagnostics
Before interpreting results, verify model fit and check for problems.
Fit Indices
fitmeasures(fit_linear, c("chisq", "df", "pvalue", "cfi", "rmsea", "srmr"))
β Check fit indices:
- CFI > .95 β
- RMSEA < .08 β
- SRMR < .08 β
With small df, RMSEA can be unstable; interpret alongside CFI and SRMR.
See Reference for threshold details.
Modification Indices
modindices(fit_linear, sort = TRUE, minimum.value = 10)
What to look for:
- Residual covariances (e.g.,
y2 ~~ y3): May indicate autocorrelation - Large values (>10) suggest localized misfit
- Cross-loadings usually shouldn't be freed in LGCM
Consider multiplicity and substantive plausibility; avoid chasing small local improvements.
β Checkpoint: Few or no modification indices >10 indicates good specification.
Residual Correlations
resid(fit_linear, type = "cor")$cor %>% round(3)
Values should be near zero (|r| < 0.10). Large residual correlations mean the model doesn't fully capture those relationships. Thresholds like |r| < .10 are sample-size dependent; in large N, also inspect SEs/tests of residuals.
Check for Problematic Estimates
parameterEstimates(fit_linear) %>%
filter(op == "~~") %>%
select(lhs, rhs, est, se) %>%
mutate(flag = ifelse(est < 0, "NEGATIVE", ""))
Red flags:
- Negative variances (Heywood cases): Model misspecified or sample too small
- Very large SEs (SE > estimate): Parameter poorly identified
- Variances at exactly zero: May need constraints
Heywood case remedies:
- Try
estimator = "MLR"and/or rescale time to reduce collinearity - Temporarily constrain problematic residual variances to a small positive lower bound while diagnosing
β Confirm: All variance estimates should be positive. If any are negative, see Troubleshooting.
Step 7: Interpret Results
summary(fit_linear, fit.measures = TRUE, standardized = TRUE)
Compare Estimates to True Values
| Parameter | Estimate | SE | True Value |
|---|---|---|---|
| Intercept mean | 49.82 | 0.51 | 50 |
| Slope mean | 2.04 | 0.06 | 2 |
| Intercept variance | 98.34 | 8.12 | 100 |
| Slope variance | 0.97 | 0.12 | 1 |
| I-S covariance | -1.89 | 0.62 | -2 |
| Residual variance | 24.56 | 1.23 | 25 |
Values shown are illustrative from one simulated run; your numbers will differ with seed, RNG, and package versions (even with the same seed).
β Success: Estimates closely recover the true population parameters. This confirms the model is working correctly.
Interpret Each Parameter
- Intercept mean β 50: Average score at Wave 1
- Slope mean β 2: Average increase per wave
- Intercept variance β 100 (SD β 10): People differed in starting levels
- Slope variance β 1 (SD β 1): People differed in growth rates
- I-S covariance β -2: Higher starters grew slightly slower (see correlation below)
Intercept-Slope Correlation
Convert the covariance to a correlation for interpretability:
cov_is <- -1.89
var_i <- 98.34
var_s <- 0.97
r_is <- cov_is / sqrt(var_i * var_s)
r_is # β -0.19
The correlation of -0.19 is smallβhigher starters grew slower, but most still improved.
Proportion with Positive Slopes
A positive slope mean doesn't mean everyone improved. Check:
slope_mean <- 2.04
slope_sd <- sqrt(0.97) # β 0.98
# P(slope > 0) β theoretical
pnorm(0, mean = slope_mean, sd = slope_sd, lower.tail = FALSE)
# β 0.98
# Empirical check using factor scores (posterior means)
fs <- lavPredict(fit_linear, type = "lv")
mean(fs[, "slope"] > 0)
# Should be close to the theoretical value
β
Checkpoint: ~98% of participants had positive slopes. With slope mean = 2 and SD β 1, almost everyone improved. The empirical proportion from factor scores should match the theoretical pnorm() result.
Variance Explained
inspect(fit_linear, "r2")
Expected RΒ² values: 0.80β0.89 across waves.
Interpretation: The trajectory (intercept + slope) explains 80β89% of variance at each wave. The remaining 10β20% is residual variance (measurement error, occasion-specific factors).
β Check: RΒ² values of 0.70β0.90 are typical for well-fitting LGCMs.
Written Summary
We estimated a linear latent growth model for 400 participants across 5 waves. The model fit well (ΟΒ²(10) = 12.35, p = .26; CFI = 0.998; RMSEA = 0.024; SRMR = 0.025).
On average, participants started at 49.82 (SE = 0.51) and increased by 2.04 units per wave (SE = 0.06), both p < .001. Significant individual differences emerged in starting levels (variance = 98.34, SD β 10) and growth rates (variance = 0.97, SD β 1). The negative intercept-slope covariance (-1.89, p = .002, r β -0.19) indicates that participants who started higher grew slightly slower.
Complete Script
# ============================================
# LGCM Complete Worked Example
# ============================================
library(tidyverse)
library(lavaan)
library(MASS)
set.seed(2024)
# Simulate data
n <- 400
psi <- matrix(c(100, -2, -2, 1), nrow = 2)
factors <- mvrnorm(n, mu = c(50, 2), Sigma = psi)
data_wide <- tibble(id = 1:n) %>%
mutate(
int = factors[,1], slp = factors[,2],
y1 = int + slp*0 + rnorm(n, 0, 5),
y2 = int + slp*1 + rnorm(n, 0, 5),
y3 = int + slp*2 + rnorm(n, 0, 5),
y4 = int + slp*3 + rnorm(n, 0, 5),
y5 = int + slp*4 + rnorm(n, 0, 5)
) %>% select(id, y1:y5)
# Visualize
data_long <- data_wide %>%
pivot_longer(y1:y5, names_to = "wave", values_to = "y") %>%
mutate(time = as.numeric(gsub("y", "", wave)) - 1)
ggplot(data_long, aes(x = time, y = y, group = id)) +
geom_line(alpha = 0.15) +
theme_minimal() +
labs(title = "Individual Trajectories")
# Fit models
model_nogrowth <- '
intercept =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
slope =~ 0*y1 + 1*y2 + 2*y3 + 3*y4 + 4*y5
slope ~ 0*1; slope ~~ 0*slope; intercept ~~ 0*slope
'
model_lin <- '
intercept =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 + 1*y5
slope =~ 0*y1 + 1*y2 + 2*y3 + 3*y4 + 4*y5
'
fit_nogrowth <- growth(model_nogrowth, data = data_wide, missing = "fiml")
fit_lin <- growth(model_lin, data = data_wide, missing = "fiml")
# Compare and summarize
anova(fit_nogrowth, fit_lin)
summary(fit_lin, fit.measures = TRUE, standardized = TRUE)
fitmeasures(fit_lin, c("chisq", "df", "pvalue", "cfi", "rmsea", "srmr"))
inspect(fit_lin, "r2")
Adapt to Your Data
To use this workflow with your own dataset:
1. Read and Prepare Data
library(readr)
library(dplyr)
# Read your data (wide format: one row per person)
data_wide <- read_csv("my_longitudinal_data.csv") %>%
select(id, outcome_t1, outcome_t2, outcome_t3, outcome_t4, outcome_t5)
2. Update Variable Names
Replace y1βy5 with your variable names:
model <- '
intercept =~ 1*outcome_t1 + 1*outcome_t2 + 1*outcome_t3 + 1*outcome_t4 + 1*outcome_t5
slope =~ 0*outcome_t1 + 1*outcome_t2 + 2*outcome_t3 + 3*outcome_t4 + 4*outcome_t5
'
fit <- growth(model, data = data_wide, missing = "fiml")
summary(fit, fit.measures = TRUE, standardized = TRUE)
3. Adjust Time Coding (if needed)
If waves are unequally spaced, adjust slope loadings to reflect actual time. For example, measurements at baseline, 1, 3, 6, and 12 months:
model <- '
intercept =~ 1*y_0mo + 1*y_1mo + 1*y_3mo + 1*y_6mo + 1*y_12mo
slope =~ 0*y_0mo + 1*y_1mo + 3*y_3mo + 6*y_6mo + 12*y_12mo
'
# Now slope = change per month
Rescaling time (e.g., months β years) rescales the slope mean and variance (per-unit); choose units that are meaningful for interpretation.
β Checkpoint: Once your basic model runs, layer on diagnostics, model comparisons, and predictors exactly as shown above.
Next Steps
- Need syntax or thresholds? β Reference
- Hit an error? β Reference: Troubleshooting
- Back to overview β LGCM Guide