Article: Silver, I. A., & Kelsay, J. D. (2021). The moderating effects of population characteristics: A potential biasing factor when employing non-random samples to conduct experimental research. Journal of Experimental Criminology. https://doi.org/10.1007/s11292-021-09478-7
1. Background (PDF)
Randomized controlled trials (RCTs) provide the ability to estimate causal relationships by creating balance on pre-existing characteristics between the treatment group and control group when replicated numerous times. Although ethical concerns limit the usage of RCTs in criminology and social sciences, it is largely considered the gold standard when determining if a treatment causes an outcome. However, due to the complexity of conducting an RCT – e.g., maintaining randomization, avoiding treatment contaminations – social scientists often rely on non-random samples to conduct experimental research. A non-random sample are a subgroup of individuals pulled from a population in a systematic manner (e.g., volunteers, convenience, self-selection), while a random sample are a subgroup of individuals pulled from a population using random probability-based selection.
Although convenient and financially cheaper, using a non-random sample to conduct experimental research could produce results – estimates, research implications, and policy implications – that are unique to only the non-random sample. For example, if one conducted an RCT of cognitive behavioral therapy (CBT) in a low risk residential facility (using a non-random sample of low-risk participants), the findings might suggest that CBT is ineffective or increases the rate of recidivism. This finding, and the subsequent policy implications, would only exist because of the interaction between the treatment (in the example CBT) and the unique characteristics of the non-random sample (e.g., low risk of recidivating).
Interactions between a treatment and the characteristics of non-random samples are not unique to correctional research, with many existing across the social sciences. Treatments to increase the effectiveness of policing (e.g., problem oriented policing [POP]) could be subject to this interaction as well. For example, using a non-random sample of low crime neighborhoods (selected out of convenience or focus) could suggest that POP is ineffective or increases crime, but in reality, POP has been illustrated to be effective at reducing crime for moderate and high crime neighborhoods.
Although scholars have long recognized this problem with experimental research, few studies have illustrated how findings from a RCT could become biased due to the characteristics of a non-random sample. The study addressed this gap in the literature through the usage of a simulation analysis to assess if the interaction between the characteristics of a non-random sample and a treatment could substantively bias the findings drawn from an RCT.
2. Summary of Findings
The simulation analyses suggested that the estimated effects drawn from a RCT could become biased when a treatment interacts with characteristics of a non-random sample. Across the 1,000 randomized populations, the observed effects from a RCT employing a random-sample were only about ±0.20 units away from the true effects. However, the observed effects from a RCT conducted with a non-random sample (with unique scores on a characteristic that interacts with the treatment; e.g., risk of recidivism and CBT) could ±8-30 units different from the true effects.
To further illustrate these findings, consider a RCT testing whether an employment-training program increases the number of days employed during parole. If this RCT was conducted with a random sample, the estimated effects might vary less than one day from the true effects of the employment-training program. If the RCT was conducted with a non-random sample, the estimated effects could differ by plus or minus 8 to 30 days and suggest that the employment-training program is ineffective or extremely effective. Overall, these findings suggest that RCTs could become biased when employing a non-random sample and, more importantly, the magnitude of the bias could be substantive and practically important.
3. Implications
The study underscores that random assignment alone does not ensure unbiased inference. When unique sample characteristics interact with a treatment, both single-study estimates and the collective results of replication studies could produce findings that are statistically and substantively different from the true effects of the treatment. Additionally, repeatedly using a similar non-random sample during replication research could empirically support the biased findings as opposed to illuminating the true effects of the treatment. Four recommendations to reduce the likelihood of a RCT with a non-random sample producing biased estimates are provided below:
- Define the population precisely. Clearly articulate the subset to which findings apply. If participants come from only the highest-risk quartile, that boundary must be explicit.
- Assess potential interactions early. During design and analysis, evaluate whether sample characteristics could modify treatment effects.
- Test moderation empirically. Include interaction terms between key covariates and treatment to identify heterogeneity.
- Report full sample characteristics. Sharing descriptive data allows future studies to draw from different subpopulations and collectively approximate the population effect.
Beyond criminology, these results extend to public health, psychology, and education, where non-random recruitment is common. The simulations show that hidden moderator effects can yield substantial over- or under-estimation of interventions even when RCT procedures are meticulously followed. Improving sampling transparency and incorporating moderator testing are thus essential for cumulative, generalizable science.
4. Data and Methods
Two simulation studies evaluated how different sampling strategies influence bias in treatment-effect estimates. Each simulated population contained 500,000 observations in which a dependent variable (Y) was affected by a treatment (Tr) and its interaction with a moderating variable (M). Samples of 500 cases (250 treatment, 250 control) were repeatedly drawn either (a) randomly from the full population or (b) non-randomly from quartiles of the moderator. Linear regression estimated the treatment effect in each sample, and this process was replicated 10,000 times per sampling condition.
A second simulation extended the analysis to 1,000 randomly specified populations, varying population distributions and true effect sizes to test generalizability. In each population, 100 samples were drawn for every sampling condition, producing 5,000 sampling distributions in total.
Both simulations followed the ADEMP framework for simulation studies. Parameters—including population means, variances, and interaction strengths—were drawn from realistic ranges to emulate applied research conditions. The authors provided complete R code as supplementary material to promote transparency and replication. Performance measures focused on mean bias and the spread of deviations between true and estimated treatment effects. Collectively, more than 5,000 sampling distributions were analyzed to quantify how unknown moderators amplify sampling error.
5. Conclusion
The research illustrated how non-random sampling can bias the effects estimated from a RCT. The analyses suggest that when population characteristics interact with a treatment, results from non-random samples can deviate up to forty-fold from the true population effect, and repeated studies using similar samples may reinforce rather than correct that bias.
The findings emphasize that genuine replication requires diversity in sampling procedures, transparent reporting, and explicit tests for moderation. By coupling rigorous internal validity with attention to external validity, researchers can produce more accurate, policy-relevant experimental evidence. Randomization remains essential—but without representativeness, even the strongest experimental designs risk producing misleading findings.
Disclosure: This research brief was prepared by ChatGPT and reviewed/edited by Ian A. Silver.