The results displayed in the paper can be reproduced with the simulated datasets and dofiles in this webpage. The following details some information to frame and facilitate the use of these resources. Please address any queries to monica_d at ifs.org.uk.
Simulations are based on the individual life-cycle model of education investment and earnings described in the paper and are made available under alternative assumptions:
- Policy environment: (i) whether or not a subsidy to advanced education is available and (ii) whether or not the agent is informed about the existence and rules of such subsidy;
- Selection mechanism: whether or not unobservables in the selection process and outcomes equation are related - that is, whether there exists selection on unobservable factors other than ability.
To account for all alternatives and allow for all estimation procedures, we include four STATA datasets. They all include 200 Monte-Carlo replications of samples of 2,000 observations each. In total, each dataset contains 400,000 simulated individuals, corresponding to the same number of observations. There are two main datasets, MCdta-corr.dta and MCdta-nocorr.dta, and two auxiliary datasets, MCdta-corr-noS.dta and MCdta-nocorr-noS.dta. The first versions listed are for Stata 10; the second are for Stata 8 or 9.
- MCdta-corr.dta [Zip file 10 MB]; MCdta-corr-v9.dta [Zip file 10 MB]
- MCdta-nocorr.dta [Zip file 10.5 MB, Stata 10]; MCdta-nocorr-v9.dta [Zip file10.5 MB]
- MCdta-corr-noS.dta [Zip file 8.7 MB]; MCdta-corr-noS-v9.dta [Zip file 8.7 MB]
- MCdta-nocorr-noS.dta [Zip file 8.7 MB]; MCdta-nocorr-noS-v9.dta [Zip file 8.7 MB]
MCdta-corr.dta and MCdta-nocorr.dta contain data for all three policy scenarios (depending on whether a subsidy to advanced education exists and is expected). The former represents the case of selection on unobservables other than ability and the latter represents the case of selection on observables and ability only. These two datasets are the basis for all estimation procedures. The following is a list of variables in each dataset:
Variable | Description | |
MCrep | Monte-Carlo replication index | |
i | Individual id in Monte-Carlo sample (1 to 2000) | |
theta | Individual ability (ranging between 0 and 1) | |
z | Observable in selection rule - family background (ranging between -2 and 2) | |
x | Observable in earnings equation - region (dummy) | |
y0 | Potential earnings if dropping off education before advanced level | |
y1 | Potential earnings if investing in advanced education | |
e_noS | Effort in preparation for test in the absence of subsidy | |
e_eS | Effort in preparation for test in the presence of expected subsidy | |
e_uS | Effort in preparation for test in the presence of unexpected subsidy | |
s_noS | Test score in the absence of subsidy | |
s_eS | Test score in the presence of expected subsidy | |
s_uS | Test score in the presence of unexpected subsidy | |
d_noS | Education attainment in the absence of subsidy (dummy) | |
d_eS | Education attainment in the presence of expected subsidy (dummy) | |
d_uS | Education attainment in the presence of unexpected subsidy (dummy) |
The two potential earnings, y0 and y1, are included in the dataset. They depend on education attainment only, not on the policy scenario, and can be used together with the education variable, d_*, to construct the observed earnings in each case.
MCdta-corr-noS.dta and MCdta-nocorr-noS.dta are used together with DID to explore the use of repeated cross sections in the estimation of returns to education. They represent a time period before the occurrence of a policy intervention amounting to the introduction of a subsidy to advanced education. Therefore, only the policy scenario with no education subsidy is considered in these datasets. MCdta-corr-noS.dta represents the case of selection on unobservables other than ability and MCdta-nocorr-noS.dta represents the case of selection on observables and ability only. The following is a list of the variables in each dataset:
Variable | Description | |
MCrep | Monte-Carlo replication index | |
i | Individual id in Monte-Carlo sample (1 to 2000) | |
theta | Individual ability (ranging between 0 and 1) | |
z | Observable in selection rule - family background (ranging between -2 and 2) | |
x | Observable in earnings equation - region (dummy) | |
e_noS | Effort in preparation for test in the absence of subsidy | |
s_noS | Test score in the absence of subsidy | |
d_noS | Education attainment in the absence of subsidy (dummy) | |
y0 | Potential earnings if dropping off education before advanced level | |
y1 | Potential earnings if investing in advanced education |
A set of STATA .do files implement each estimation procedure:
- download the STATA .do files [Zip file 12 KB]
Two .do files were created for each method, labelled
- name-of-the-method.do and
- name-of-the-method-programs.do.
The former contains the main routine, which defines the dataset and variables being used, calls the estimation routines and displays the results. The latter contains two main estimation routines (together with other auxiliary routines in some cases). The first routine implements the respective estimator in a given dataset for a certain set of variables provided by the user. The second routine repeatedly applies the estimation procedure to a series of datasets to produce the Monte-Carlo results.