Use residualized change score regression to isolate within-person change while adjusting for baseline levels in ABCD longitudinal analyses.
LM โบ Continuous
stats::lmabcd-studyresidualized-changeregression
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโuse the "Suggest changes" button to help us improve!
Overview
Residualized change scores quantify within-subject change while controlling for baseline levels by regressing follow-up values on initial values and extracting residuals that represent deviations from expected change. Unlike simple difference scores, this approach isolates true change from regression-to-the-mean effects. This tutorial analyzes height measurements from ABCD youth across two annual assessments, generating residualized change scores that capture individual deviations from expected growth and testing whether handedness predicts variability in height change beyond what baseline values explain.
When to Use:
Choose this when you have two timepoints and need to control for baseline levels while examining associations with follow-up outcomes.
Key Advantage:
Residualized change separates true change from regression-to-the-mean by regressing follow-up on baseline and analyzing the residuals.
What You'll Learn:
How to fit the baseline-adjusted model, extract residualized change scores, test predictors against those residuals, and visualize distributions.
Data Access
Data Download
ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.
Loading Data with NBDCtools
Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:
Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves
The create_dataset() function is the main tool for loading ABCD data:
library(NBDCtools)# Define variables needed for this analysis
requested_vars <-c("var_1",# Variable 1
"var_2",# Variable 2
"var_3"# Variable 3
)# Set path to downloaded ABCD data files
data_dir <-Sys.getenv("ABCD_DATA_PATH","/path/to/abcd/6_0/phenotype")# Load data with automatic transformations
abcd_data <-create_dataset(dir_data= data_dir,study="abcd",vars= requested_vars,release="6.0",format="parquet",categ_to_factor=TRUE,# Convert categorical variables to factors
value_to_na=TRUE,# Convert missing codes (222, 333, etc.) to NA
add_labels=TRUE# Add variable and value labels
)
Key Parameters
vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values
### Load necessary libraries
library(NBDCtools)# ABCD data access helper
library(tidyverse)# Collection of R packages for data science
library(lavaan)# Structural Equation Modeling in R
library(gt)# Presentation-Ready Display Tables
library(gtsummary)# Creating publication-quality tables
library(broom)# Organizing model outputs
### Load harmonized ABCD data required for this analysis
requested_vars <-c("ab_g_dyn__design_site","ab_g_stc__design_id__fam","nc_y_ehis_score","ph_y_anthr__height_mean")data_dir <-Sys.getenv("ABCD_DATA_PATH","/path/to/abcd/6_0/phenotype")abcd_data <-create_dataset(dir_data= data_dir,study="abcd",vars= requested_vars,release="6.0",format="parquet",categ_to_factor=TRUE,# Convert categorical variables to factors
value_to_na=TRUE,# Convert missing codes (222, 333, etc.) to NA
add_labels=TRUE# Add variable and value labels
)
Create Long Format Dataset
23 lines
# Create long-form dataset with relevant columns
df_long <- abcd_data %>%select(participant_id, session_id, ab_g_dyn__design_site, ab_g_stc__design_id__fam, nc_y_ehis_score, ph_y_anthr__height_mean)%>%# Keep only baseline and year 1 sessions
filter(session_id %in%c("ses-00A","ses-01A"))%>%arrange(participant_id, session_id)%>%mutate(# Relabel sessions
session_id=factor(session_id,levels=c("ses-00A","ses-01A"),labels=c("Baseline","Year_1")),# Relabel handedness
handedness=factor(nc_y_ehis_score,levels=c("1","2","3"),labels=c("Right-handed","Left-handed","Mixed-handed")),# Convert height to numeric
height=round(as.numeric(ph_y_anthr__height_mean),2))%>%# Rename for clarity
rename(site= ab_g_dyn__design_site,family_id= ab_g_stc__design_id__fam
)
Reshape to Wide Format for Residualized Change Analysis
21 lines
# Reshape data from long to wide format for residualized change analysis
# Step 1: Get static variables (participant-level) from baseline only
df_static <- df_long %>%filter(session_id =="Baseline")%>%select(participant_id, site, family_id, handedness)%>%filter(handedness !="Mixed-handed")%>%# Remove mixed-handed participants
droplevels()# Drop unused factor levels
# Step 2: Pivot time-varying variable (height) to wide format
df_timevarying <- df_long %>%select(participant_id, session_id, height)%>%pivot_wider(names_from= session_id,values_from= height,names_prefix="Height_")# Step 3: Join static and time-varying data
df_wide <- df_static %>%inner_join(df_timevarying,by="participant_id")%>%drop_na(Height_Baseline, Height_Year_1)# Only keep complete cases
Descriptive Statistics
27 lines
# Create descriptive summary table
descriptives_table <- df_long %>%select(session_id, handedness, height)%>%tbl_summary(by= session_id,missing="no",label=list( handedness ~"Handedness", height ~"Height"),statistic=list(all_continuous()~"{mean} ({sd})"))%>%modify_header(all_stat_cols()~"**{level}**<br>N = {n}")%>%modify_spanning_header(all_stat_cols()~"**Assessment Wave**")%>%bold_labels()%>%italicize_levels()### Apply compact styling
theme_gtsummary_compact()descriptives_table <-as_gt(descriptives_table)### Save the table as HTML
gt::gtsave(descriptives_table,filename="descriptives_table.html")### Print the table
descriptives_table
Characteristic
Assessment Wave
Baseline N = 118681
Year_1 N = 112191
Handedness
ย ย ย ย Right-handed
9,418 (79%)
0 (NA%)
ย ย ย ย Left-handed
848 (7.2%)
0 (NA%)
ย ย ย ย Mixed-handed
1,594 (13%)
0 (NA%)
Height
55.3 (3.2)
57.6 (3.3)
1n (%); Mean (SD)
Statistical Analysis
Fit Model
38 lines
### Model
# Predict follow-up (Year_1) height from Baseline height
baseline_model <-lm(Height_Year_1 ~ Height_Baseline,data= df_wide)# Create simple and residualized change scores
df_wide <- df_wide %>%mutate(residualized_change=residuals(baseline_model))# Portion not explained by baseline
# Regress the residualized change scores on handedness
model <-lm(residualized_change ~ handedness + site,data= df_wide)# 2. Extract and tidy model summary
tidy_model <- broom::tidy(model)# 3. Format into a gt table
model_summary <- tidy_model %>%gt()%>%tab_header(title="Regression Summary Table")%>%fmt_number(columns=c(estimate, std.error, statistic, p.value),decimals=3)%>%cols_label(term="Predictor",estimate="Estimate",std.error="Std. Error",statistic="t-Statistic",p.value="p-Value")model_summary
# 5. Save as standalone HTML
gt::gtsave(data= model_summary,filename="model_summary.html",inline_css=FALSE)
Regression Summary Table
Predictor
Estimate
Std. Error
t-Statistic
p-Value
(Intercept)
โ0.252
0.096
โ2.641
0.008
handednessLeft-handed
โ0.055
0.062
โ0.893
0.372
site2
0.277
0.124
2.241
0.025
site3
0.242
0.122
1.987
0.047
site4
0.250
0.117
2.136
0.033
site5
0.178
0.136
1.303
0.193
site6
0.023
0.121
0.194
0.846
site7
0.107
0.139
0.765
0.444
site8
0.232
0.138
1.685
0.092
site9
0.395
0.131
3.005
0.003
site10
โ0.011
0.117
โ0.097
0.922
site11
0.374
0.132
2.834
0.005
site12
0.151
0.124
1.226
0.220
site13
0.319
0.117
2.727
0.006
site14
0.337
0.122
2.761
0.006
site15
0.890
0.131
6.815
0.000
site16
0.352
0.111
3.176
0.001
site17
0.255
0.123
2.073
0.038
site18
0.116
0.135
0.863
0.388
site19
0.430
0.125
3.442
0.001
site20
0.095
0.119
0.801
0.423
site21
0.426
0.124
3.422
0.001
site22
0.453
0.378
1.198
0.231
Interpretation
Interpretation
Handedness does not significantly predict residualized height change: Compared to right-handed participants (the reference group), left-handed participants had a non-significant change in height (b = -0.05, p = 0.40). Mixed-handed participants also showed no significant difference in height change relative to right-handers (b = 0.05, p = 0.31). These results suggest that handedness does not meaningfully contribute to variability in height change over the one-year period.
Visualization
28 lines
# Select a random subset for visualization (e.g., 250 participants)
df_subset <- df_wide %>%sample_n(250)# Create a violin plot to visualize residualized change scores by handedness
violin_plot <-ggplot(df_subset,aes(x= handedness,y= residualized_change,fill= handedness))+geom_violin(trim=FALSE,alpha=0.7)+# Violin plot without trimming the tails
geom_jitter(position=position_jitter(width=0.2),size=1.2,alpha=0.5)+# Add jittered points for individual observations
scale_fill_brewer(palette="Set2")+# Use a color palette from RColorBrewer
labs(title="Residualized Change in Height by Handedness",x="Handedness",y="Height Residuals")+theme_minimal()+# Apply a minimal theme for a clean look
theme(axis.text.x=element_text(angle=45,hjust=1),# Rotate x-axis labels for better readability
legend.position="none"# Remove the legend as it's redundant
)print(violin_plot)# Save as a high-resolution PNG file
ggsave(filename="visualization.png",plot= violin_plot,width=10,# Specify width in inches (or units)
height=5,# Specify height in inches (or units)
units="in",# Specify units (e.g., "in", "cm", "mm")
dpi=300)# Specify resolution (e.g., 300 for good quality)
Interpretation
Interpretation
The violin plot displays the distribution of residualized height-change scores after baseline adjustment. Each group centers near zero and shows comparable spread, reinforcing that the regression residuals contain no systematic differences by handedness. Overlayed jittered points make it easy to spot individual outliersโnone deviate meaningfully from the main massโso the null findings are not being driven by a handful of unusual observations. Taken together, the figure and model output show that once baseline stature is controlled, subsequent growth is effectively independent of handedness.
Discussion
No discussion provided.
Additional Resources
4
R Documentation: lm and residuals
DOCS
Official R documentation for the lm() function and residuals() method, essential for computing residualized change scores that control for baseline values.
Classic methodology paper discussing problems with change score analysis and advantages of residualized change scores for controlling baseline differences (Lord, 1967). Note: access may require institutional or paid subscription.