Why Study Growth?
Longitudinal data capture something powerful: change. When you measure the same people over time, you can ask questions that matter:
- How do symptoms evolve after treatment begins?
- Do children's reading skills develop at the same rate?
- Does cognitive decline accelerate with age?
These are questions about trajectoriesānot just whether groups differ at a single moment, but how individuals change over time.
Traditional approaches like repeated measures ANOVA focus on group means: "Did the average score increase?" But that question often misses what we actually care about.
Consider a therapy study where the average patient improves by 8 points over 6 months. Success? Look at the individual trajectories:
The average tells you "the group improved"ābut the real questions are:
- How much do people differ in their starting points?
- How much do they differ in their rates of change?
- Are these two things related? (Do high starters change faster or slower?)
Latent Growth Curve Models (LGCM) answer exactly these questions. Every person gets their own trajectoryāa starting point (intercept) and rate of change (slope)āand the model quantifies how much people vary in these trajectories.
Look at the spaghetti plot above. Before reading further, consider:
- Intercept variance: How spread out are the starting points at Wave 1?
- Slope variance: Do the lines fan out over time, or stay roughly parallel?
- Intercept-slope relationship: Do high starters tend to change faster, slower, or neither?
Hold these observations in mindāwe'll return to them when discussing the model parameters.
What LGCM Provides
LGCM offers four capabilities that simpler methods lack:
Individual Trajectories
Each person has their own growth curve. The model estimates not just the average trajectory, but the variance around itāhow much people differ in starting points and rates of change.
Flexible Time Structures
Waves don't need to be equally spaced. Measure at baseline, 3 months, 6 months, and 2 years? No problemājust code time accordingly. The slope becomes "change per unit time" in whatever metric you choose.
Missing Data Handling
Modern estimation (Full Information Maximum Likelihood) uses all available data. Participants who miss a wave still contribute informationāno listwise deletion required.
Testable Model Fit
Unlike approaches that simply estimate parameters, LGCM produces fit indices. You can ask: Does a linear trajectory actually describe these data? Is a more complex model (quadratic, piecewise) warranted?
When LGCM is Appropriate
LGCM works well when you have:
| Requirement | Guideline |
|---|---|
| Repeated measures | 3+ waves (4+ preferred for testable fit) |
| Outcome type | Continuous (or ordinal; prefer categorical estimators such as WLSMV; 5+ categories often treated as approximately continuousāverify robustness) |
| Interest in individual differences | Not just "did the mean change?" |
| Adequate sample | 100+ for simple models; more for complex |
Key Components of Linear LGCM
Regardless of your statistical background, these are the building blocks of every LGCM.
The Two Latent Factors
Every person in your study has two numbers you can't directly observe:
| Factor | What It Captures | Defined By |
|---|---|---|
| Intercept (Ī·ā) | Each person's level at Time = 0 | Where you place "0" in time coding |
| Slope (Ī·ā) | Each person's rate of change per time unit | Constant in linear model |
These aren't measured directlyāLGCM infers them from the pattern of observed scores across time.
Extensions: Add time-varying covariates (TVCs) to predict the outcome at each occasion, and time-invariant covariates (TICs) to predict the latent intercept and slope. TVCs capture occasion-specific influences; TICs capture stable between-person differences.
A useful analogy: In traditional factor analysis, multiple items measure one latent construct. In LGCM, multiple time points measure two latent constructsāwhere you start and how you change. The key difference is that LGCM's factor loadings aren't estimated from data; they're fixed to encode your time structure.
Factor Loadings: Encoding Time
In the linear LGCM presented here, slope loadings are fixed to encode time (0, 1, 2, ā¦). Variants like latent-basis or piecewise growth free or re-fix some loadings to capture nonlinearity.
| Factor | Loadings | Why |
|---|---|---|
| Intercept | 1, 1, 1, 1, 1 | Contributes equally at every time point |
| Slope | 0, 1, 2, 3, 4 | Encodes time; contribution grows linearly |
The loadings 0, 1, 2, 3, 4 assume equally spaced waves with Time 0 at Wave 1. See Time Coding for centering options and unequal spacing.
Factor Means: The Average Trajectory
| Parameter | Symbol | Interpretation |
|---|---|---|
| Intercept mean | μā | Average starting level across people |
| Slope mean | μā | Average rate of change across people |
Together they define the average trajectory:
Average score at time t = μā + μā Ć t
Example: If μā = 50 and μā = 2, the average person starts at 50 and increases by 2 per wave.
Factor Variances: Individual Differences
This is where LGCM shinesāquantifying how much people differ.
| Parameter | Symbol | Interpretation |
|---|---|---|
| Intercept variance | Ļāā | How much people differ in starting levels |
| Slope variance | Ļāā | How much people differ in change rates |
Left: Low slope varianceāeveryone changes at similar rates, so lines stay roughly parallel. Right: High slope varianceāsome people improve rapidly, others stay flat, some decline. The dark line represents the mean trajectory in both cases.
Interpreting scale: Variances are in squared units. Take the square root for the standard deviation:
- If Ļāā = 100 ā SD = 10 ā ~68% of people have intercepts within ±10 of the mean
- If Ļāā = 1 ā SD = 1 ā ~68% of people have slopes within ±1 of the mean
Factor Covariance: The Intercept-Slope Relationship
One of LGCM's most informative parameters is the relationship between where people start and how fast they change:
| Sign | Interpretation |
|---|---|
| Positive | High starters grow faster (or decline slower). "Rich get richer." |
| Negative | High starters grow slower (or decline faster). Catching up or ceiling effects. |
| Zero | Starting level doesn't predict change rate. |
These patterns aren't just statistical curiositiesāthey have substantive meaning:
- A negative intercept-slope correlation in a therapy study might mean patients with severe symptoms improve more
- A positive correlation in educational research might indicate a "Matthew effect" where initial advantages compound over time
Convert covariance to correlation for easier interpretation:
r = Ļāā / ā(Ļāā Ć Ļāā)
A correlation of -0.20 means a modest negative relationship: those who start higher grow slightly slower.
If the slope variance (Ļāā) is very small or near a boundary, the interceptāslope covariance/correlation can be unstable; inspect confidence intervals and consider rescaling time or centering to improve estimation.
Residuals: What the Trajectory Doesn't Explain
Observed = Predicted + Residual
yįµ¢ā = (Ī·āįµ¢ + Ī·āįµ¢ Ć t) + εᵢā
Residual variance (Īø) captures the noise around individual trajectoriesāmeasurement error, occasion-specific factors, or model misspecification.
Key assumptions:
- Residuals uncorrelated across time (can be relaxed in advanced models)
- Can be equal or freely estimated across waves
Mathematical Foundations (optional formal notation)
The Individual Trajectory Equation
Each person's score at time t follows:
yįµ¢ā = Ī·āįµ¢ + Ī·āįµ¢ Ć Ī»ā + εᵢā
Where:
- yįµ¢ā = observed score for person i at time t
- Ī·āįµ¢ = person i's intercept (latent)
- Ī·āįµ¢ = person i's slope (latent)
- Ī»ā = time coding for occasion t (fixed: 0, 1, 2, 3, 4)
- εᵢā = residual (measurement error + occasion-specific variation)
Distribution of Growth Factors
The intercept and slope follow a bivariate normal distribution:
[Ī·ā] [μā] [Ļāā Ļāā]
[Ī·ā] ~ N([μā], [Ļāā Ļāā])
This means:
- μā, μā = population means (the "average" trajectory)
- Ļāā, Ļāā = variances (individual differences in intercept and slope)
- Ļāā = covariance (relationship between starting level and change)
For identification, observed item/indicator intercepts are fixed to 0, and we estimate the latent factor means; the latent intercept mean is thus the expected score at time = 0 (given the chosen time coding).
Model-Implied Covariance Structure
LGCM is a structural equation model. It implies a specific covariance matrix:
Ī£ = ĪĪØĪ' + Ī
Where:
- Ī = factor loading matrix (fixed time codes)
- ĪØ = factor covariance matrix (estimated)
- Ī = residual covariance matrix (typically diagonal)
Model fit tests whether your observed covariance matrix matches this implied structure.
Degrees of Freedom
| Components | Count |
|---|---|
| Observed statistics | T(T+1)/2 + T means |
| Estimated parameters | 2 factor means + 3 factor (co)variances + T residual variances |
| Degrees of freedom | Observed ā Estimated |
For 5 waves with equal residual variances: df = (15 + 5) ā (2 + 3 + 1) = 14
Assumptions for this df: (a) observed means are included among the sample statistics but intercepts are fixed to 0 with two latent factor means freed (intercept and slope), and (b) residual variances are constrained equal across time (one Īø parameter). If Īø is freed per time point, df decreases accordingly.
The Path Diagram
The path diagram provides a visual grammar for LGCM:
The intercept factor has all loadings fixed to 1. The slope factor has loadings encoding time (0, 1, 2, 3, 4). The curved arrow represents the covariance between factors. Residuals (ε) capture occasion-specific variation.
Text description (for accessibility)
āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā
ā Intercept ā ā Slope ā
ā (Ī·ā) āāāāāāāāāāāŗā (Ī·ā) ā
āāāāāāāā¬āāāāāāā āāāāāāāā¬āāāāāāā
ā ā
āāāāāāāāā¬āāāāāāāā¼āāāāāāāā¬āāāāāāāā ā
ā1 ā1 ā1 ā1 ā1 ā
ā¼ ā¼ ā¼ ā¼ ā¼ ā
āāāāā āāāāā āāāāā āāāāā āāāāā ā
āy1 ā āy2 ā āy3 ā āy4 ā āy5 ā ā
āāā¬āā āāā¬āā āāā¬āā āāā¬āā āāā¬āā ā
ā² ā² ā² ā² ā² ā
ā0 ā1 ā2 ā3 ā4 ā
āāāāāāāāā“āāāāāāāā“āāāāāāāā“āāāāāāāā“āāāāāāāā
Intercept loadings (top): all 1s
Slope loadings (bottom): 0, 1, 2, 3, 4
Double-headed arrow: factor covariance
Residual variances omitted for clarity
Key insight: Unlike exploratory factor analysis, LGCM fixes the factor loadings. You don't estimate themāyou set them based on your time coding. This constraint is what makes the intercept and slope interpretable as starting level and rate of change.
Visualizing Individual Trajectories
The path diagram shows the model structure; spaghetti plots show what it describes:
This figure illustrates three highlighted trajectories from different regions of the intercept-slope distribution:
| Trajectory | Intercept | Slope | Pattern |
|---|---|---|---|
| Blue | High | High | Started high, grew fast |
| Purple | High | Low | Started high, grew slowly |
| Orange | Low | High | Started low, grew fast |
The intercept-slope covariance determines how concentrated trajectories are in certain regions. A strong negative covariance would mean most people fall in the purple (high-low) and orange (low-high) patternsāhigh starters tend to grow slower, while low starters catch up.
Interactive Exploration
To build deeper intuition for how LGCM parameters affect trajectories, use the interactive explorer below. Adjust the sliders to see how changing the intercept mean, slope variance, and other parameters affects the spaghetti plot in real-time.
Practical Considerations
How Many Time Points?
LGCM requires at least 3 time points for linear growth, with 4+ preferred. With only two waves, simpler approaches (change scores, ANCOVA) work just as well. More waves provide better power to detect individual differences in growth and to test whether linear trajectories actually fit.
With 3 waves, linear growth is identified but overall model fit is weakly testable. With 4 waves, you can begin to test linear vs. alternative forms (note: a quadratic model can be just-identified under common constraints).
Missing Data
Real longitudinal data almost always have missing observations. LGCM handles this gracefully through Full Information Maximum Likelihood (FIML), which uses all available data without listwise deletion.
The key assumption is Missing At Random (MAR): missingness can depend on observed variables, but not on the missing values themselves. If participants drop out because their unobserved scores are extreme (Missing Not At Random), estimates may be biased. Understanding this distinction helps you evaluate whether your missing data pattern is problematic.
FIML assumes MAR and correct model specification. Include auxiliary variables related to missingness (and the outcome) to bolster MAR plausibility and reduce bias.
Time Coding
Slope factor loadings define how time enters the model. This determines:
- What the intercept represents
- How to interpret the slope
- Whether unequal spacing is handled correctly
Standard Coding (0, 1, 2, 3, 4)
| Wave | Loading | Meaning |
|---|---|---|
| 1 | 0 | Intercept = expected score here |
| 2 | 1 | 1 unit of time elapsed |
| 3 | 2 | 2 units elapsed |
| 4 | 3 | 3 units elapsed |
| 5 | 4 | 4 units elapsed |
With this coding:
- Intercept = expected score at Time 0 (Wave 1)
- Slope = expected change per 1-unit increase in time
Centering at Different Time Points
Shift the zero point to change what the intercept represents:
| Centering | Loadings | Intercept Meaning |
|---|---|---|
| Wave 1 (standard) | 0, 1, 2, 3, 4 | Expected score at Wave 1 |
| Midpoint (Wave 3) | -2, -1, 0, 1, 2 | Expected score at Wave 3 |
| Final (Wave 5) | -4, -3, -2, -1, 0 | Expected score at Wave 5 |
How time coding determines the intercept's location. All three panels show the same trajectory, but the intercept (marked "I") refers to different time points depending on where you place the zero in your slope loadings. The slope (rate of change) remains identical across all three.
Text description (if image doesn't render)
Three panels showing the same linear trajectory with different intercept locations:
- Left (Wave 1 centering): Loadings 0,1,2,3,4 ā intercept "I" marked at Wave 1
- Center (Midpoint centering): Loadings -2,-1,0,1,2 ā intercept "I" marked at Wave 3
- Right (Final wave centering): Loadings -4,-3,-2,-1,0 ā intercept "I" marked at Wave 5
The slope (line angle) is identical in all three; only the intercept reference point changes.
Across different centerings, the meaning of the slope does not change; only the intercept's meaning shifts to the chosen time origin.
When does centering matter? When you add predictors. If you ask "Does baseline depression predict growth?", the intercept-predictor relationship depends on where you defined the intercept.
Non-Equidistant Time Points
If waves aren't equally spaced, use actual time values:
| Assessment | Time | Loading |
|---|---|---|
| Baseline | Month 0 | 0 |
| Follow-up 1 | Month 3 | 3 |
| Follow-up 2 | Month 6 | 6 |
| Follow-up 3 | Month 12 | 12 |
| Follow-up 4 | Month 24 | 24 |
Now slope = change per month. Rescale for interpretability if needed (e.g., divide by 12 for slope = change per year).
Rescaling the time metric also rescales the slope mean and variance (e.g., per-month ā per-year), which can improve interpretability and numerical stability.
Common Pitfalls
Before applying LGCM, be aware of these frequent misinterpretations.
Ignoring slope variance ā Concluding "everyone improved" because mean slope = 2. Reality: If slope SD = 2, roughly 16% have slopes ⤠0. Always report variance alongside the mean.
Misreading intercept-slope correlation ā Assuming negative correlation means high starters declined. Reality: Negative correlation means high starters grew slower, not that they declined. Combine with slope mean to interpret direction.
Confusing means with individuals ā Stating "the model shows steady improvement" as if everyone follows this pattern. Reality: The mean trajectory summarizes the group; individual trajectories can look wildly different. Plot spaghetti plots alongside the mean.
Equating good fit with truth ā Assuming RMSEA < .06 proves the model is correct. Reality: Good fit means plausible, not true. Multiple models can fit the same data. Always compare alternatives (quadratic, piecewise).
Misinterpreting recentered intercepts ā Recentering time at midpoint but still interpreting the intercept as "baseline." Reality: The intercept refers to wherever time = 0 in your coding. Match interpretation to your time coding.
Ignoring residual autocorrelation ā Assuming residuals are independent across time without checking. Reality: Large modification indices or poor fit may indicate carryover effects. Check modification indices; consider autoregressive structures if needed.
Overfitting with too few time points ā Fitting a quadratic model with 4 waves (0 df) and concluding "quadratic growth." Reality: Just-identified models fit perfectly by constructionāthere's no test of fit. Ensure positive degrees of freedom.
Ignoring the missing data mechanism ā Using FIML and assuming everything is fine. Reality: FIML assumes MAR. If dropout depends on unobserved values (MNAR), estimates may be biased. Examine who drops out; include auxiliary variables; conduct sensitivity analyses.
Over-interpreting non-significant variance ā Slope variance p = .08, so "everyone changes at the same rate." Reality: You may be underpowered; p = .08 isn't strong evidence of zero. Report the estimate and confidence interval, not just the p-value.
Forgetting to look at the data ā Jumping straight to modeling without visualization. Reality: You might miss outliers, nonlinearity, subgroups, or data errors. Always plot spaghetti plots first.
Summary
You now have the conceptual foundation for understanding LGCM:
- Two latent factorsāintercept and slopeācapture each person's starting level and rate of change
- Factor variances quantify individual differences; the covariance reveals whether high starters grow faster or slower
- Time coding determines what the intercept means and must match your study design
- LGCM produces testable fit, allowing you to evaluate whether linear growth actually describes your data
This is enough to understand what an LGCM does and why. To actually fit one, continue to the worked example.
This hub focuses on linear LGCM for continuous outcomes. Not covered (yet): generalized growth for counts/binary, mixture/latent class growth, piecewise/latent-basis models, and measurement invariance testing when outcomes are latent factors measured by multiple items. See the tutorial links for code and estimation details.
Next Steps
Walkthrough: Worked Example ā Step-by-step R code to simulate data, fit models, and interpret results.