Downloads

CWP151717.pdf
PDF | 662.25 KB
In a randomized control trial, the precision of an average treatment effect estimator and the power of the corresponding t-test can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as other similar studies, a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our proce-dure seeks to minimize the resulting average treatment effect estimator’s mean squared error or the corresponding t-test’s power, subject to the researcher’s budget constraint. We rely on a modification of an orthogonal greedy algorithm that is conceptually simple and easy to implement in the presence of a large number of potential covariates, and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to reductions of up to 58% in the costs of data collection, or improvements of the same magnitude in the precision of the treatment effect estimator.
Authors

Research Fellow Columbia University
Sokbae is an IFS Research Fellow and a Professor at Columbia University, with an interest in Econometrics, Applied Microeconomics and Statistics.

Research Fellow University College London
Pedro is a Professor of Economics at University College London and an economist in the IFS' Centre for Microdata Methods and Practice (cemmap).

Research Associate LMU Munich
Daniel is a Research Associate of the IFS in Cemmap and Professor of Statistics and Econometrics at LMU Munich.
Working Paper details
- DOI
- 10.1920/wp.cem.2017.1517
- Publisher
- The IFS
Suggested citation
P, Carneiro and S, Lee and D, Wilhelm. (2017). Optimal data collection for randomized control trials. London: The IFS. Available at: https://ifs.org.uk/publications/optimal-data-collection-randomized-control-trials-0 (accessed: 26 March 2025).
More from IFS
Understand this issue

Gender norms, violence and adolescent girls’ trajectories: Evidence from India
24 October 2022

Simulated list size and performance against the 18-week target under a variety of treatment growth rate assumptions
Although performance improves in each case, in none of our scenarios does performance reach the 92% target by the end of the parliament.
20 March 2025

Two-child limit mitigation in Scotland would help larger poor families but policy design could harm work incentives
Mitigating the two-child limit policy would be an effective way to reduce child poverty, but designing an effective policy is not straightforward.
14 March 2025
Policy analysis

ABC of SV: Limited Information Likelihood Inference in Stochastic Volatility Jump-Diffusion Models
We develop novel methods for estimation and filtering of continuous-time models with stochastic volatility and jumps using so-called Approximate Bayesian Compu- tation which build likelihoods based on limited information.
12 August 2014

Assessing the economic benefits of education: reconciling microeconomic and macroeconomic approaches
This CAYT report discusses the strengths and limitations of several approaches to assessing the effect of education on productivity.
14 March 2013

Misreported schooling, multiple measures and returns to educational qualifications
We provide a number of contributions of policy, practical and methodological interest to the study of the returns to educational qualifications in the presence of misreporting.
1 February 2012
Academic research

Prediction sets and conformal inference with censored outcomes
This paper provides estimation methods of such prediction sets given observed conditioning covariates when 𝑌 is censored or measured in intervals.
21 January 2025

Individual welfare analysis: Random quasilinear utility, independence and confidence bounds
We introduce a novel framework for individual-level welfare analysis.
13 December 2024

Inference for parameters identified by conditional moment restrictions using a generalized Bierens maximum statistic
Building on Bierens (1990), we propose penalized maximum statistics and combine bootstrap inference with model selection.
13 December 2024