UNIVERSITAT ROVIRA I VIRGILI DEPARTAMENT D’ECONOMIA  WORKING PAPERS Col·lecció “DOCUMENTS DE TREBALL DEL DEPARTAMENT D’ECONOMIA - CREIP” Student earnings expectations: Heterogeneity or noise? Luis Diaz Serrano Joop Hartog William Nilsson Hans van Ophem Po Yang Document de treball n.25- 2016 DEPARTAMENT D’ECONOMIA – CREIP Facultat d’Economia i Empresa UNIVERSITAT ROVIRA I VIRGILI DEPARTAMENT D’ECONOMIA  Edita: Departament d’Economia www.fcee.urv.es/departaments/economia/publi c_html/index.html Universitat Rovira i Virgili Facultat d’Economia i Empresa Av. de la Universitat, 1 43204 Reus Tel.: +34 977 759 811 Fax: +34 977 758 907 Email: sde@urv.cat CREIP www.urv.cat/creip Universitat Rovira i Virgili Departament d’Economia Av. de la Universitat, 1 43204 Reus Tel.: +34 977 758 936 Email: creip@urv.cat Adreçar comentaris al Departament d’Economia / CREIP ISSN edició en paper: 1576 - 3382 ISSN edició electrònica: 1988 - 0820 DEPARTAMENT D’ECONOMIA – CREIP Facultat d’Economia i Empresa STUDENT EARNINGS EXPECTATIONS: HETEROGENEITY OR NOISE? Luis Diaz-Serranoa CREIP - Universitat Rovira i Virgili Joop Hartogb University of Amsterdam William Nilssonc Universitat de les Illes Baleares Hans van Ophemd University of Amsterdam Po Yange Peking University Abstract Students’ choices in education can only be based on expected outcomes. Econometric models that infer expectations based on ex post outcomes impose a rational structure of expectations on school performance and post-graduation earnings. Direct surveys suggest much ignorance and fuzziness. We use survey data on expectations in four universities in three countries and check for relations of expected probability to graduate and of expected earnings with personal abilities and attitudes. We find that most of the difference in expectations among individuals is just noise. Keywords: Student expectations; earnings; earnings dispersion; risk attitudes JEL Codes: I21; I26; J24; D83 ________________________________ CREIP-Universitat Rovira i Virgili, Departament d’Economia, Av. de la Universitat 1, 43204 Reus (Spain); luis.diaz@urv.cat; b Amsterdam School of Economics, University of Amsterdam Roetersstraat 11, 1018 WB Amsterdam NL; j.hartog@uva.nl c Universitat de les Illes Balears, Cra. de Valldemossa, km 7.5. Palma (Illes Balears); william.nilsson@uib.es d Amsterdam School of Economics, University of Amsterdam Roetersstraat 11, 1018 WB Amsterdam NL; j.c.m.vanophem@uva.nl e PKU Graduate School of Education, Yi Heyuan Road No.5, HaiDian district, Beijing, PRC; pyang@gse.pku.edu.cn a We acknowledge financial support from the Spanish Ministry of Economy and Competitiveness (grant# ECO2014-59055-R) 1 1. Introduction In this paper we report on analyses of four essentially identical surveys held at four universities. We find no evidence that students have well-structured anticipations on their potential performance in extended education, and on earnings consequences. We find that the commonly observed large variation among students in expected wages after graduation is just noise, rather than a reflection of the heterogeneity that is usually hard to uncover by a researcher. There is much evidence that students’ expectations of wages earned after some education vary widely1. For example, Hartog, Van Ophem and Bajdechi (2007) use available empirical information on distributions of earnings with secondary or tertiary education to simulate anticipated rates of return to advanced education. The overall mean of these returns is 17% for both women and men, with standard deviations of 19 and 18%, respectively: the plus or minus one standard deviation range is from 0 to 35%. If such variation mostly reflects noise it cannot have a meaningful effect on educational choices, if it’s unobserved heterogeneity, it is potentially influential and it may enhance our understanding of the choices. Typically, the effect of expected earnings on educational choices is estimated to be very modest. For example, Arcidiacono, Hotz and Kang (2012) analyse a sample of Duke students, grouping their data by 6 majors. If students would perceive the same rank for their performance in each major, the predicted average absolute change in the student population share of the six majors would be 4 percentage points (each share is close to 18%, except for a low of 9% for humanities); if personally expected earnings would be equal across majors, the predicted average absolute change in shares would be 1 For a survey of the variation in earnings expectations, see Hartog and Diaz Serrano (2014). 2 only 1 percentage point.2 The dominant effect of ability over earnings is also visible from rankings of majors. Ability is measured as the student’s perceived ranking in a major. In each major, the diagonal rank is by far the highest: students clearly perceive their rank as highest in the chosen major. This does not hold for earnings. Personally expected earnings tend to be anchored on perceptions of realised earnings for graduates, but these perceptions also vary widely. Wiswall and Zafar (2013) asked NYU students for their perceptions on earnings of graduates with different majors. The coefficient of variation, across respondents, in the distributions of perceived mean incomes is easily over ¾; estimation errors are large, with absolute values in 5 majors, for both genders, generally above 30%. Betts (1996) asked undergraduates at UC San Diego across all faculties for their perceptions of graduates’ earnings, by major, at the national level. Coefficients of variation across students are typically close to 0.30 and ratio’s of P(10)/P(90) are generally around 2. The median absolute error, relative to true mean national earnings, is typically some 20%. The structure of actual earnings, as rankings or even relative differences by education and gender, is often reasonably accurately known, but here also, substantial errors frequently occur (Hartog and Diaz Serrano, 2014). Nicholson and Souleles (2001) give strong evidence on anchoring of personal expectations on market perceptions, from survey data on students in a US medical school. For every 1000 dollar increase in the perceived market income of the specialty that the student plans to enter, the student’s own expectation for that specialty goes up by 590 dollars. For every 1000 dollars difference between the student’s estimate of perceived specialty income and true specialty income, the student’s expected income goes up by 840 dollars. Misperceptions in With equal abilities the biggest change would be a gain of 8 percentage points for Engineering, with equal expected earnings the biggest change would be a loss of 3 percentage points for Economics. 2 3 actual specialty income end up almost dollar for dollar in income expectations. Schweri, Hartog and Wolter (2011) also find that individuals’ own expectations mirror what they see in the labour market as realised outcomes. Wisswall and Zafar (2013) report that own expectations are significantly and positively related to perceived population earnings, at an elasticity of about 0.3. Ex post realized variation in earnings, for a given education, will be related to personal characteristics like ability, ambition, perseverance etc. If such characteristics and their effects on earnings are known to individuals, part of the ex post observed variation reflects unobserved heterogeneity. Interpreting all ex post variation as ex ante risk for (potential) students would then be erroneous. The distinction between unobserved heterogeneity and risk is relevant as in models of educational choice, earnings risk will play a different role than the mean of anticipated earnings. One approach to disentangle risk and heterogeneity is to measure them from ex post data by estimating an imposed econometric structure, as e.g in Chen (2008) and Cunha, Heckman and Navarro (2005). Chen finds that most ex post variance reflects risk, while Cunha et al. claim that half of the variance reflects heterogeneity and thus is foreseeable by students. An alternative approach, pioneered by Dominitz and Manski (1996), is simply to ask students for their anticipations, and not only to elicit anticipations for mean earnings in a specified education scenario, but also for the dispersion, and in fact, seek to identify their personally held distribution of anticipated earnings. This method reveals substantial individual uncertainty about earnings consequences of education 3 . However, neither anticipated means nor anticipated dispersions have strong robust relationships with available personal characteristics (Hartog and Diaz Serrano, 2014). Dominitz and Manski (1996); Wolter and Weber (2004); Schweri, Hartog and Wolter (2011); Mazza (2012); Hartog, Ding and Liao (2102). 3 4 Clearly, there is an interesting and relevant question whether students hold well-structured and well-articulated expectations on the financial outcomes of educations. Estimations on ex post data that rely on unobserved heterogeneity hidden in the black box of correlated errors between choice equations and earnings equations (as in Chen and in Cunha et al.) do not lead to identical conclusions on the relative importance of risk and heterogeneity. Direct surveys point to substantial ex ante uncertainty but robust links with personal qualities have not been established. The magnitude of risk and the legitimacy of equating it with ex post observed variance are relevant issues for understanding educational choice and for the question whether market wages compensate for differences in earnings risk (Hartog, 2011). In this paper, we report on a project that seeks to reveal information sets of students already engaged in university education. The goal is to uncover usually unobserved heterogeneity by just asking for it and to see if it has a plausible structure. We have asked students for their earnings expectations and combine this with questions on objective observables like school grades and test scores as measures of ability and motivation, and on variables usually hidden in unobserved heterogeneity: self-assessed abilities, risk attitude and reasons for choosing their type of education. We did this in four universities in three countries, to increase validity of results. Data collection and analysis are essentially similar, but allow for specific local conditions. We figured, from the literature and from our own perceptions, that if students hold a wellstructured information set on the earnings consequences of their options, this should be manifest in several observations: 5  Relations as commonly observed in earnings functions: negative effect on expected earnings for gender (female), positive effect for better family background, positive effect for cognitive ability, in particular mathematical ability  Positive effects on expected earnings for self-assessed abilities: abler individuals should earn more  Higher expected earnings for individuals that place high weight on earnings: they should deliberately seek out better earnings opportunities  Larger earnings dispersion in the chosen education than in a rejected education for individuals with lower risk aversion (for sorting by earnings risk, see Skriabikova (2014))  Higher earnings dispersion in the chosen alternative for better family background (higher wealth leads to lower risk aversion – the effect might be covered by measured risk aversion, however)  Positive effect of abilities on probability to graduate. No doubt, analysis of the structure of earnings expectations has wide relevance. Choice of education is inevitably based on future benefits, and thus our analyses are relevant for understanding the choice process. The policy relevance is also clear, as improving the quality of information on future benefits is an obvious policy target if there is indeed scope for such improvement. Moreover, the vital policy question of the sensitivity of demand for (higher) education to the riskiness of the pay-off, always a key issue in discussions on student loan and grant systems, requires first of all good knowledge of student anticipations of pay-offs. Our work fits in with growing attention for actually held beliefs by students that was strongly stimulated by the work of Manski (Manski, 2004; Dominitz and Manski, 1996) and 6 exemplified most recently by work such as Stinebrickner and Stinebrickner (2014), Zafar (2011), Wiswall and Zafar (2013) and Arcidiacono, Hotz, Maurel and Romano (2014). We started out from rather ambitious notions on a finely articulated structure of expectations. However, we found so little structure in relationships between expected earnings and individual characteristics that we restrained from testing the more subtle hypotheses that we formulated (or might have formulated): expected earnings are virtually unrelated to those variables that we think should make up unobserved heterogeneity. Our key conclusion is that anticipated earnings are dominated by noise. In the final section of this paper (section 9) we give a detailed review of the literature and conclude that our results are no exception: available research points towards a rather unambiguous outcome. In our concluding remarks in Section 10 we present the implications for further work as we see them. 2. The nature of our surveys We focus on choice in tertiary education and the relation with earnings expectations, on four locations: Amsterdam, Peking and two locations in Spain (Baleares and Catalonia). We have surveyed Bachelor students observed in a given curriculum and asked for earnings expectations: median (or mean4) and dispersion. To measure dispersion, we follow Dominitz and Manski: we ask for the probability to earn more than 25% above the median and for the probability to earn less than 25% below the median. We ask expected earnings for working straight after high school, for working after completing the present study (Bachelor degree or a Master degree) and for the case of working after completing an alternative study, to test for 4 As explained below, in a second round of data collection, we asked for mean rather than median earnings. In Amsterdam we asked for the mean already in the first round. As will become clear below, the distinction is immaterial for our core results. 7 systematic effects of a well-informed choice. At one site, Amsterdam, we do not ask for earnings with a Bachelor degree, as virtually every graduate continues to Master level, and no one enters the labour market with a Bachelor degree. To check for anchoring personal expectations on perceived market outcomes, we ask for the earnings of an average graduate of the respondent’s field of study. We opt for asking earnings of an average graduate at age 45, as it may be mid-career earnings that determine an individual’s perspectives on labour market outcomes. We have also collected data on expected probability to graduate for different scenarios. To condition on demographic background we ask for the usual variables (such as gender and parental background). We collect information on grades in secondary school or university admission exams, to get a measure of abilities. We extend this with questions on selfassessed abilities and attitudes (risk attitude, reason to choose present study), to uncover variables that remain usually hidden in unobserved heterogeneity. The survey, as used in Amsterdam, is attached as Appendix A. Surveys in the other participating universities are similar, with local adjustment where necessary, for example to allow for differences in school systems. The surveys have been administered in 2013-2015. Details are given in Appendix B. In Amsterdam, surveys were administered in class, at different points in the academic year, among students in the Faculty of Economics and Business at the University of Amsterdam (UvA) who took courses in econometrics; sample size is 402, representing a response rate of 38%. In Peking we collected data from undergraduate students with different majors at the University of Peking (PKU), a top university in China, with highly selective admission, based on scores in a national university admission exam. The online surveys are responses to electronic invitations to students’ email accounts. The response rate was 29%, 161 first year 8 and 72 fourth year students. The Baleares data are collected in the Faculty of Economics and Business at the University of the Balearic Island (UIB), in the course Analysis of Economic Data; answering the survey is mandatory for later participation in an individual assignment which yields 15% of the assessment of the course. 431 first year students participated. In Catalonia, the survey has been carried out in the Faculty of Economics and Business at the Rovira i Virgili University (URV), during classes in the course of Statistics I, which is taught in the first year of study. The survey was carried out during the first semester, about 3-4 weeks before the end of the semester (November). Participating in the survey was not mandatory, but it was administered during the teaching time, so all students present at that moment participated. In this way we cannot ensure a participation rate of 100%, but the participation rate is still quite high. We have 445 responses, divided over four study years: 1 (179), 2 (190), 3 (47) and 4 (17) (for 12 students, study year is not known).5,6 In our study, we not only ask about students’ earnings expectations in the current college major they are studying, but we also ask about some counterfactuals. That is, what their expectations would be if they had chosen a different college major than the one they are currently studying. To do so, in Amsterdam, Baleares and Catalonia,7 an alternative study was randomly assigned, from a list we had drawn up (see the list of alternatives in Appendix B). In Peking, students were asked to choose their preferred alternative major from the list of Students in the study year 2, 3 and 4, are students who have not been able to pass the subject of Statistics I, in their first and consecutive years. 6 In Spain, when students want to choose a study in a public university they are asked to list 8 studies ranked by preference. The grades of the admission test averaged with the overall high school grades is used by the Spanish government to rank students in order to give priority to the best students to their first choice. In many cases, students that performed poorly may end up in the study they ranked in second, third or even fourth position. At the URV, the field of study for the surveyed individuals in 2013 was the first option for 78% of the respondents, the second for 12.5%, the third for 6%, and between the fourth and eighth for the remaining 3.5%. In 2014, these figures were 74%, 13%, 5.5% and 7.5%. At the UIB the corresponding numbers were; 73.8%, 14.8%, 3.7% and 7.7% (in 2013) and 71.4%, 18.4%, 5.0% and 5.5% (in 2014). 7 In the region of Baleares there is only only one public university, the UIB which is located in the capital, Palma de Mallorca. However, in the región of Catalonia the picture is quite different. There are seven public universties, four in Barcelona, and one in each of the three remaining provinces. Our surveys in Catalonia were carried out in Universitat Roviria i Virgili, which is a public university located in the province of Tarragona and it is about 100 kms distance to the south from Barcelona. 5 9 44 majors offered at PKU; the list was restricted to studies offered at the same university, to maintain admission to the same highly selective university8. 3. The Dominitz and Manski method: do students understand probability? Dominitz and Manski (1996, DM) pioneered a method to obtain information on anticipated distributions of earnings rather than just a single point such as the mean or the median. DM ask for median earnings and for the probability to obtain earnings more than 25% below or above the median9. In their pioneering project, they give students extensive information on the concepts of median and probability, and direct feedback when answers violate the rules of probability (e.g. probabilities for all options adding up to more than 1). In our surveys we explained the concept of the median, but we did not enforce compliance with the rules of probability by signaling unacceptable answers and requesting a new entry. We did not use interactive computer software, essentially because we did not want to impose structure on students’ response but simply find out the nature of their information. We found that students are poorly familiar with the concept of the median. Conversations with students made this clear but it is also evident from the table below. Students frequently state a probability of income below 75% of the median (or above 125% of the median) greater than 0.5 (we will refer to these thresholds as 0.75M and 1.25M). They also appear to be unaware of the sum restriction: the sum of the stated tail probabilities should not exceed 1. And there is a consistency problem: if the sum of stated tail probabilities exceeds 1, this would imply negative probability for outcomes in the range between 75% and 125% of the Two students choose their present major, instead of an alternative one. The information allows to calculate the parameters of a lognormal distribution of the individual’s expected wage. Kaufmann (forthcoming) uses a similar approach, see below. 8 9 10 median. In this case, students are apparently unaware of the division of probability mass in three segments that should add up to 1. As explained below, for the median and the dispersion of expected wages, we have up to 6 scenarios for education. We ask for expectations for oneself regarding the current study and for the counterfactual regarding expectations with only completed secondary education and with a randomly assigned alternative college study, and we ask similar expectations for an average graduate of the specified education. We also differentiate between starting wages and wages when 45 years old. In Table 1 we give the ranges for violation of probability rules across the specifications. In each cell we show the minimum and maximum percentage of respondents that violate the probability rules. Table 1. Percentage of answers violating probability rules, across education scenarios P(0.75M)+P(1.25M)>1 P(0.75M)>0.5 P(1.25M)>0.5 P(0.75M)>0.5 and P(1.25M)>0.5 Amsterdam 9.8 - 18.2 14.2 - 28.3 7.8 - 24.0 1.7 - 8.9 UIB, Baleares 7.3 - 9.2 17.5 - 31.8 14.1 - 18.4 2.1 - 3.3 Peking 23.1 - 42.7 39.3 - 48.3 24.8 - 55.1 12.8 - 27.8 URV, Catalonia 11.9 - 22.0 34.6 – 42.5 20.7 - 24.3 10.0 - 18.8 The conclusion from these results is quite clearly that without feedback, stated probabilities often do not obey the restrictions that define them 10 . We checked for symmetry in the Dominitz and Manski (1997A) ask households for the probability that their income over the next 12 months will fall below each of four increasing thresholds. 22 of 415 respondents violate increasing values of the cumulative probability, a favourable score they ascribe to feedback on earlier probability questions. 10 11 anticipated distribution of earnings (DM assume log-normality); under mild assumptions this test is not invalidated by violations of range restrictions, and we have taken the stated probabilities at face value. The distributions we have encountered are definitely not symmetric. In Amsterdam, the probabilities in the upper tail (above 1.25M) and in the lower tail (below .75M) are only equal in some 30% of the cases, with the lower tail mostly fatter than the upper tail, although this differs among subgroups. In Peking, symmetry occurs in less than 22% of the cases, with the lower tail generally less fat than the upper tail, at the Baleares, symmetry occurs in about 13% of the cases, while in Reus symmetry ranges from 14% to 18% depending on the scenario. Imposing or not imposing the restrictions of probability theory, by feedback on violations or otherwise, is a choice of methodology. We have opted for an open approach to student information, to find out what they know. Our conclusion is that the information that students hold does not by itself obey the requirements of probability theory. It requires giving feedback or structuring possible answers to meet the requirements (e.g by asking to divide 100 points over possible answers). To deal with violations in our analyses, we have usually taken probabilities as stated, and introduced a dummy to indicate a violation. 4. Are expectations anchored on perceptions? In Peking, for a Master degree in the respondent’s studies, the correlation of the median earnings reported for an average person at age 45 and reported for the respondent her/himself right after graduating is 0.37. Lower tail probabilities (below 75% of the median) correlate at 0.28 and upper tail probabilities at 0.23 for own expectations and perception on earnings for an average person at age 45. 12 In Amsterdam, the correlation is much higher. For the respondent’s study, own median earnings right after graduation and median earnings for an average graduate at age 45 correlate at 0.67. At the UIB in Baleares, the correlation between own median starting wage and wage for an average graduate at age 45 is 0.55. Many students answered the wage expectations and perceptions as if the wage refers to monthly wage although we asked for annual wage. For the group with both wages answered as “monthly wage” the correlation is 0.55. For those with both wages answered as an “annual wage” the correlation is 0.68. At the URV in Catalonia, the correlation between own wage expectations after graduation and the 45 years old average graduated is 0.52. Lower tail probabilities (below 75% of the median) correlate at 0.38, while upper tail probabilities (above 125% of the median) at 0.53. We can conclude that personal earnings expectations are indeed substantially anchored on perceptions for an average graduate. In Amsterdam, Baleares and Reus, anticipated medians correlate above 0.5; in Peking, the correlation is lower, at 0.37. The lower correlation in Peking may reflect that the Chinese economy is in transition, with wages adjusting to more liberal market conditions. 5. Variables to measure heterogeneity We consider three dependent variables on anticipated results of education: median earnings, earnings dispersion and probability to graduate. Earnings dispersion is measured as the sum of the tail probabilities: the probability of earnings below 75% of the median plus the probability of earnings above 125% of the median. Explanatory variables can be grouped in three categories: demography and family background, abilities, and attitudes. The demographic variables are easily observable for 13 anyone and the effect on earnings should be transparent to students. We test whether a student is well informed by checking if standard results from regressions for market wages are mirrored in expectations. Thus, we predict lower anticipated median earnings for women, and higher medians for students from wealthier and better educated families and for students growing up in more urbanised areas. We have no prediction of the effect of student age on anticipated median earnings. Women might have lower earnings dispersion and students from wealthier backgrounds may have higher dispersion on account of stronger/weaker risk aversion, but this effect can be tested directly with measured risk attitude. Higher college completion rates for female students compared to males are well documented (see e.g. Bailey and Dynarski 2011), but neither the literature nor common observation suggest clear predictions of the effects of other variables on perceived graduation probabilities, but perhaps students from wealthier and better educated families and students raised in more urban areas have more confidence and perceive higher probabilities. In the regressions, gender is measured with a dummy for female. Urban high school is a set of dummies for the level of urbanisation of the environment while in secondary education, with countryside as default. Father’s education is essentially distinguished as primary, secondary and tertiary level, with some local differentiation. Parental income is specified with dummies for quintiles, where we have asked the respondents to locate family income within the national income distribution. Foreign student is a dummy for respondents who had their secondary education in another country than the university, usually their homeland. Several survey questions ask for answers related to their homeland. This relates in particular to their earnings expectations when they indicate the intention to return there after graduation. 14 For abilities, we have self-reported grades in secondary education (overall average grades, or math grades if available) or university admission tests, and we have self-assessed abilities. School grades and math grades have been transformed to the system of university attended if that differs from the student’s secondary school’s residence (e.g from an A-F scale to a 1-10 scale for foreign students in Amsterdam). School grades and test scores are objective external data known to the individual and should inform the individual about opportunities in school and beyond. The very nature of unobserved heterogeneity entails that students have perceptions on their own abilities that go beyond grades and test scores. These will not be condensed in a single number like grades and test scores, and have a more nebulous nature. Retrieving them is not straightforward and measurement with a full-fledged psychometric test battery on latent variables is neither feasible nor desirable. We want to use concepts that individuals themselves are (or could be) aware of; it is not plausible that students’ perceptions of their abilities are formulated as psychometric test scores. We decided to ask students to rate themselves on a percentile scale for four abilities: mathematical, verbal, commercial and social. The instruction was to rate themselves among students with the same secondary education. Mathematical ability was left to common understanding, verbal ability was described as ability to express yourself articulately in your own language, ability to learn foreign languages. Commercial ability has been defined as the ability to convince someone (to buy a product, or accept an opinion), social ability as the easiness of making contacts, making friends and feeling at home with other people. Our choice was meant to measure variables that can easily be explained to and recognized by respondents, as they should provide the grading themselves. It should also be reasonable to 15 suppose that these abilities would normally be part of their “unobserved heterogeneity” and of course, they should have relevance for labour market performance. There exists a large literature on individual abilities that are relevant for labour market performance (see e.g. the survey in Hartog, 2001). Mathematical ability, correlating highly with general intelligence, has been found in earnings functions to be one of the best ability variables for predicting earnings. Verbal ability, next to mathematical ability, has been included to reflect an often claimed dichotomy between science and humanities orientation. In Hartog (1980) we have shown that intellectual, social and commercial abilities are important to explain wage differentials. We believe that these four variables can cover much of the heterogeneity of the labour market in terms of the mental abilities required for success11. Our variables have a link to the American Dictionary of Occupational Titles that specifies job requirement for jobs across the entire range of the labour market. The three variables in Hartog (1980) - intellectual, social, commercial - have been extracted by factor analysis from the job requirements specified in that database. In 1998, the DOT has been succeeded by O*NET. Required worker abilities are now distinguished in four main groups, one of which is cognitive (next to physical, psychomotor and sensory). The cognitive group covers seven abilities including verbal and quantitative. O*NET also includes five so called crossfunctional skills: developed capacities that facilitate performance of activities that occur across jobs. One of these five is social skills. The distinction between verbal and mathematical, or quantitative, is also made by Heckman (2006): he uses test scores on five measures of cognitive skill: arithmetic reasoning, word knowledge, paragraph As we deal with university students we have ignored manual abilities, mostly though not exclusively relevant for more practically orientated occupation of the labour market. Relevant fields, like dentistry, were not included. 11 16 comprehension, mathematical knowledge, and coding speed, derived from the Armed Services Vocational Aptitude Battery (ASVAB). Hence, while we cannot copy the full breadth of O*NET, nor use all the information collected in the literature on occupational psychology, our choice of variables certainly finds support in the literature. To assess the nature of our ability measures we present correlation matrices in Table 2. In Amsterdam, among self-assessed abilities, verbal, social and commercial abilities have substantial inter-correlations (between 0.4 and 0.5), but math stands apart. Overall school grades and math correlate well. The correlations between school grades and self-assessed abilities are modest, with the highest value for self-assessed math and school grade in math. In Peking, the overall secondary school grade correlates highly with the math grade (0.54) but also with English (0.59). The self-assessed abilities have remarkably low correlations with high school grades. Self-assessed math ability correlates only at 0.20 with the math grade, and the correlations among verbal ability and English and Chinese are even lower. Among themselves, the self-assessed abilities correlate fairly high, and all positively (between 0.4 and 0.6). Thus, overall school grades, school grades in math and in languages correlate well, self-assessed abilities correlate well but school grades and self-assessed abilities correlate poorly. At the UIB in Baleares, we find low correlation between high school grades and the selfassessed abilities (below 0.16). At values between 0.12 and 0.36, the correlations among the self-assessed abilities are mostly higher, but still on the low side. At the URV in Catalonia, the data contain three grades: overall high school, high school math grades and university access test scores. High school grades correlate well with the access 17 test grades (0.56) but not that much with math grades (0.34). Math grades correlate only 0.26 with self-assessed math ability; however, correlation with the remaining self-assessed abilities is practically zero. On the contrary, high school grades correlates positively with self-assessed abilities: math (0.19), verbal (0.20) social (0.13) and commercial (0.14). Correlations among self-assessed abilities are quite heterogeneous across the board. For instance, commercial ability correlates almost 0.4 with verbal and social ability, while only 0.12 with math ability. Verbal and social abilities correlate 0.4, while math ability correlates practically zero with verbal and social ability. These results point to a clear conclusion. School grades are not very informative on the selfassessed abilities and self-assessed abilities may thus be expected to have added value for predicting performance. The corrrelation between self-reported math grade and self-assessed math ability is far from perfect. In line with literature and common observation, we would predict that median earnings are positively related to average school grades (and admission test scores) and to mathematical ability. Grades and math ability can be taken as measures of general intelligence and this should have value in any occupation. We also predict a positive effect on probability to graduate (cf Light and Strayer 2000). Verbal, social and commercial abilities are more specific aptitudes, and may have different value in different occupations. Hence, in an earnings regression across educations (and anticipated occupations) one would estimate an average effect across educations, but the abilities are so broad that it is hard to imagine anything but a positive effect. On the effects of abilities on earnings dispersion we are not aware of any predictions. An intuitive guess would be that higher perceived ability levels reduce the uncertainty in the earnings prediction. Higher perceived ability may imply more selfconfidence and a stronger sense of control over outcomes, and conversely, a less vulnerable 18 position in the labour market. A stronger anticipated market position may stimulate more risky choices, but as we will control for risk attitude, this should not obscure the negative effect. Effects of verbal, social and commercial abilities on probability to graduate are probably modest at best. Among attitudes, we measure risk aversion and reason for education choice. Risk aversion is measured on the now fairly common 1-10 scale for preparedness to take risk, from unwilling to fully prepared. The most straightforward prediction is a positive effect of willingness to take risk on earnings dispersion. With lower risk aversion, individuals will be inclined to consider jobs with greater earnings risk and presumably higher median earnings, as a risk premium. Occupational sorting by risk attitude into jobs with greater financial risk has been established empirically by Bonin et al. (2007) and by Skrabikova (2014), compensation in market wages for earnings risk has been established in many studies surveyed in Hartog (2011). The interesting test is whether these patterns will also be observed in expected wages and dispersions. The probability to graduate may be predicted to be positively affected by risk aversion. It would be consistent with the observation that children from lower social background choose less risky educations and occupations, and shy away from more “difficult” studies, often explained from their higher risk aversion. We measure preferences by asking for the weights of four arguments to choose an education: “I like the type of work”, “It fits my abilities”, “Earnings”, “Other”, with weights forced to add up to 100. The straightforward prediction here is that a higher weight for earnings leads to higher median earnings in the chosen education. “I like the type of work” might have a negative effect on median earnings as it indicates preference for other considerations than high earnings. A higher weight for “fits my abilities” would presumably lead to a higher 19 probability of success in the chosen education and possibly also a positive effect on median earnings. 6. Regressions We have analysed the expectations for median earnings, dispersion of earnings and probability to graduate (with OLS). We have estimated regressions for the individual’s present study, for completing high school only, for an alternative university education, and for the individual’s perception of the results for an average individual completing the study that the respondent is presently engaged in. For completing high school only, we have no results for the probability to graduate, as individuals already have graduated. For the respondent’s present study, we aim at information about the first time of possibly entering the labour market; this is after graduating with a bachelor degree in Reus, Baleares and Peking, and after graduating with a Master in Amsterdam, as virtually no one starts working with only a Bachelor degree. For the average person, we ask the respondent to focus on a person graduated with their own major, at age 45, as this is likely to be the reference age for considering the (lifetime) benefits of an education, more so than just the situation upon labour market entrance. Perceptions on an average person and anticipations for oneself are asked separately as anticipations may be anchored on perceptions of average outcomes and because individuals may locate themselves deliberately in segments of the perceived distribution of outcomes (e.g., in the top, or in the bottom segment). To highlight the effects of choice, we have asked students also to consider an alternative, not chosen study. By comparing chosen and rejected study, the systematic effect of choice should become more visible. For example, a student with low risk aversion should have greater difference in earnings dispersion between chosen and rejected study than a more risk averse 20 student. Similarly, a student choosing on basis of earnings should have a higher gap in median earnings between chosen and rejected study than a student putting high weight on the nature of the occupation. To test these predictions, these alternative studies should be randomly selected. In Amsterdam, Reus and Baleares, we choose indeed randomly from a selected set of alternative studies. In Peking, with restricted admission to educations, we take the student’s second-best option as the alternative. In the latter case, we might assume that the students are better informed about the properties of the alternative, as it is an option that presumably they have seriously considered. The baseline specification includes all our core variables. Regressions with extra dummies for violations of probability rules and several alternative specifications have also been estimated. The regressions include some locally relevant variables, such as type of secondary education and class year (first year, second year) but these results are not reported. For missing observations on a regressor we use the STATA option to delete the entire record, except for the cases were we explicitly included a dummy variable for a missing observation on that variable. 7. Results The key finding is the general weakness of our results: we find very few statistically significant effects and we have not uncovered a systematic structure of unobserved heterogeneity. We will therefore present only basic results, to illustrate our claim: only results for probability to graduate, anticipated earnings and earnings dispersion for the present study of the respondent. We will not present any results for an alternative study, for an average student rather than for the individual herself, for differences between alternative and present study or for difference between self and average: such specifications were 21 clearly too ambitious. The lack of systematic significant effects in these more advanced specifications simply underscores our basic conclusion12. We started our analyses with data collected in the first year, 2013. To check the robustness of our results, we have repeated our surveys in the next academic year, with adjustments on some aspects we considered problematic.  In 2014 we ask for mean earnings rather than median, as we found that the concept of median is not well understood without feedback. We might have opted for providing feedback on the definition of a median, but as noted, we prefer to abstain from imposing structure a priori: we want to tap as much as possible the individual’s actual information set. The mean is a well understood concept.  In 2014, dispersion is measured from an explicit routine: after specifying mean earnings, calculate half that value, add to and subtract from the mean and then provide probabilities for the four segments (below half the mean, between half the mean and the mean, between mean and 1.5 the mean and above 1.5 the mean). Dispersion is measured as the sum of the probabilities in the two extreme segments13.  We reformulated some questions, to sharpen concepts and to reduce the risk of misunderstandings. In particular, in 2014 we asked for self-assessed abilities relative to high school class mates.  In 2014, we changed the specification of self-assessed abilities in the regressions. In 2013 we used a quantitative measure taken from the percentile ranking scale in the In Appendix A we reproduce the questionnaire and in Appendix B we give some details on data collection. Kaufmann (forthcoming), asks for maximum and minimum expected wages and for the probability of surpassing the midpoint of the two. Assuming a triangular distribution allows to calculate desired measures of location and dispersion. 12 13 22 questionnaire, in 2014 the abilities are measured with a dummy variable indicating a value above the median. 7.1 Peking: Table 2 For most of the regressions in 2013, the number of observations varies between 195 and 202. The total number of observations is 234. So the number of missing observations is between 32 to 39 observations, which represent14-17% of the total observations. Almost all variables have valid answers for more than 200 observations. The variables with more missing observations include age (209 valid observations), self-assessed ability (215), and reason for choosing major (215). Hence, it is not a particular variable that is responsible for the large loss of observations in the regressions. With a dummy variable called sample1, all observations have been divided in two groups: the first group has no missing observations for any independent variable, and the other group has a missing value on at least one variable. Separate t-tests on the averages of the explanatory variables reveal that the hypothesis that the population average is equal for the two groups is only rejected for female and for self-assessed math ability. In the May-June 2014 survey, we received 827 questionnaires (31% freshman, 30% senior and the remainder sophomore or junior). 9% is a student of Economics and Business, 32% of Natural Sciences and Math, and the rest divided over Computers and Engineering, Humanities, Medicine and Other. 34% of the respondents is female. The regressions show a few plausible results, and only one is robust over time. In 2014, girls expect significantly lower mean earnings than boys; the effect is of similar magnitude in 2013 but not significant. In 2014, the probability to graduate within 4 years is higher for 23 individuals who rate their mathematical ability higher and who have chosen their field of study because it fits their capabilities, but these effects are not significant in 2013. Students from families with family income at least in the 8 th decile have higer expected (median/mean) earnings in 2013, but not in 2014 . 7.2 Amsterdam: Table 3 In Amsterdam, we have just over 300 observations in 2013 and just over 230 observations in 2014. In 2014 we start out from 413 respondents, but there are many missing observations on specific variables. For high school grades we have 325 observations, for abilities 220, for risk attitude 222 and for choice motivations 214. In particular from foreign students we have some suspect responses on earnings. Earnings below 5000 euro were considered as monthly earnings and we multiplied them by 12. Earnings below 10 000 euro (including cases where this holds after multiplying by 12) have been identified with a dummy. In the regressions we find a few significant plausible effets, but only one effect is robust: selfassessed mathematical ability has a positive effect on probability to graduate in 2013 and 2014, at the same magnitude. The negative effect of gender on earnings is only significant in 2014, the positive effects of high school grade and self-assessed mathematical ability only in 2013, and the positive effect of a desitable job on probabilty to graduate only in 2014. 7.3 Baleares (UIB): Table 4 The data cover 431 students, but the regressions contain only slightly more than 300 students. As the descriptive statistics of both explanatory variables and dependent variables show, almost all variables have valid answers for more than 400 observations. Hence, as in Peking, it is not a particular variable that is responsible for the loss of observations in the regressions. Based on the regression of own expectations on wage from present study two 24 groups were created; the one used in the regression (which had answered all questions) and another that were not used in the regression (because they had a missing value on at least one variable). Performing (separate) t-tests on the averages of the explanatory variables (and also the wage expectation), the hypothesis that the population average is equal for the two groups was never rejected. Of course this exercise does not guarantee that that those who decided to not answer a particular question could have systematically high or low values on that question, but as a group, no particular differences are found. Earnings expectations below 4000 Euros seem way to low to refer to annual earnings, and, accordingly we rescaled those cases by multiplying the stated expectation by 12. A dummy variable was created indicating if the earnings expectation was rescaled. Data collection in 2014 is identical to 2013, now with about 300 valid observations. A problem with confusion of monthly and annual earnings in 2013 has been solved by clearly asking for monthly earnings. In the Baleares, we find the highest proportion of signficant effects among our four universities, but only one effect is robust: high school grades have significantly positive effect on probability to graduate in both 2013 and 2014. By contrast, high family income has a significant negative effect on probability to graduate in 2013 and a significant positive effect in 2014. In 2013, girls expect a 7% lower median earnings, which could well be a realistic anticipation, but in 2014 the effect is no longer significant. There are other cases of significant effects that support our hypotheses (like lower dispersion for high urban density and highly educated fathers in 2013), but none of these is robust over time. 7.4 Catalonia (URV): Table 5 25 For earnings, we find some plausible effects in 2013 (higher earnings in urban areas, for higher secondary school grades, if good earnings was a motive for choice) but they are not confirmed in 2014. Similarly, the regressions for earnings dispersion provide some confirming results in 2013 (higher dispersion for low income background), but they are not robust. For probability to graduate there is one robust effect: grades in secondary education have a positive effect in both years, at a magnitude that is about equal to the magnitude at the Baleares. 7.5 Conclusions from our analyses On the Manski method we can draw a firm conclusion: it works poorly without feedback on statistical concepts. The concept of the median and the range restriction on probabilities are not standard knowledge. Our results have clearly shown that anticipated earnings distributions are generally not symmetric, and asking for information on both the upper tail and the lower tail is needed. Based on the survey in 2014 we also found that adding questions on probabilities in four areas of the distribution are clearly superior compared to only asking about the tails in the distribution. The reason is that students have it much easier to keep in mind that the sum of the probabilities must be 1, when they actually have to fill in all probabilities. Our prime interest is in uncovering the structure of anticipations that students have on schooling scenario’s, on probability of success and on labour market benefits. Our key conclusion is that the anticipations are far away from the neat rational structure that is often implied in econometric modelling and estimated on realized outcomes. Detailed analysis of our estimation results in four different settings reveals barely any significant findings that are plausible and consistent with a priori predictions. The probability to graduate is 26 frequently higher for students with higher math grade in high school and with higher school grades. Expected earnings are lower for alternative studies that were never considered as a potential choice. However, the overwhelming impression is the absence of well-structured anticipations: essentially, there are too many statistically insignificant effects to look for economic significance in a consistent interpretation. Even basic results are so weak that searching for more subtle effects in differences between chosen and alternative studies is pointless. Standard features like a gender effect on earnings, or effects of family background are not robustly found. We have asked for self-assessed levels of four abilities that may be considered relevant for anticipated future success in the labour market. These abilities correlate very low with school grades, which makes them an interesting addition to commonly available personal characteristics. One might see such self-assessed abilities as core variables to reflect unobserved heterogeneity. However, we cannot detect a coherent pattern of relationships of these variables on anticipated success in future careers. Anticipated earnings dispersion is not related to risk attitude. We consider the failure to detect a coherent structure of expectations as the key result of our studies. Thus, we found essentially only one robust result that supports predictions: the perceived probability to graduate from the present study is significantly related to high school grades, and, slightly less robust, to self-assessed mathematical ability. This conveys a very strong suggestion that variation in other expectations across individuals largely reflects noise. 8. What may be wrong? 27 Our key conclusion is that students do not hold a set of well-articulated consistent expectations on the effects of schooling. Below, we will argue that this conclusion is essentially supported in the literature. But we will first face the question if our negative findings can be explained from flaws in our study. Possible weaknesses of our analyses are: 1. Non-response Non-response is not a relevant issue in Baleares; yet, we find no different result there. If selective response is an issue in the other locations, it would seem to work in the wrong direction, as one may assume the least interested and the least informed students not to participate. That would leave us with the most interested and the best informed students. It would even strengthen the conclusion. 2. Sample size Low levels of statistical significance significance may be due to small sample size. It is worth noting however, that comparable studies usually also have small sample sizes. The seminal Dominitz and Manski (1996) paper is based on 110 observations. Arcidiacono et al. (2012 and 2014) use data on 173 students. Wiswall and Zafar (2013 WP) use 488 observations, Stinebrickner and Stinebrickner (2012) have 653. In that comparison, our samples with some 200 - 300 valid observations (and one sample around 550) are certainly not exceptional. 3. Lack of incentives for respondents to answer carefully. This problem is hard to tackle and standard in this type of survey. The easy answer is that if students have strong opinions and perceptions, it will be easy to answer the survey: why would they need incentives at all? If they don’t have clear perceptions and expectations, we may indeed expect the fuzzy answers we get, representing their true state of information. The analysis by Botelho and Costa Pinto (2004) runs counter to this argument. They show 28 that there is “no significant difference between beliefs using hypothetical surveys and real financial incentives”. 4. Low reliability and validity of our variables on abilities and attitudes. The question on risk attitude has been successfully used elsewhere (see Bonin et al. 2007). Likert scales for self assessed abilities are quite common in educational research, with a wide range of specifications and results (Pajares 1996). Further reflection on useful specification may be helpful, but with so many issues unresolved, there is no simple well established recipe to copy. For example, Spinath, Spinath, Herlaar and Plomin (2006) report succes among grade school pupils in predicting school achievement from self assessed abilities, but abilities are taken as competence in specific tasks, an approach not easily applied for our purpose. 5. Measurement error may bias our estimation results. Students could have difficulties to evaluate their proper skills; about 25% answer that they are top 10% students. Since the question refers to “graduated from secondary education” this is in principle possible, since university students could be a favorable selection. But overestimation of abilities and overconfidence are commonly observed features. Also, when students specify their expectations they may tend to give more consideration to skills where they perform well, and play down the importance of skills where they perform badly. On the other hand, we should point out that our results do not differ between different settings: plain classroom settings in Amsterdam and Reus, internet surveys without compensation in Amsterdam and with modest compensation in Peking, participation mandatory for further course work at the Baleares. The similarity of results in different environments certainly supports a claim of robustness. 29 9. How do our results relate to the literature? Standard econometric models of schooling choices and the benefits of education stress the role of unobserved heterogeneity and private information: economic agents have more information on their attitudes and abilities that condition their expectations and choices than the researcher can observe. For a long time, unobserved heterogeneity was accounted for in the structure of residual errors and then measured from observed outcomes. Correction for selectivity bias in estimated returns to education and correlated errors in choice equations and earnings are routine procedures in empirical analyses of education. More advanced econometric models seek to disentangle unobserved heterogeneity and risk in anticipated benefits of education from observed ex post outcomes, by estimating an imposed econometric structure. In Chen (2008), schooling choice is modelled with an ordered probit for “unobserved ability, motivation and taste for education”; the unobserved component in wages is allowed to be correlated with the unobservable ability/taste factor, with the correlations shifting the earnings distributions by level of education that individuals face when making up their minds about extending their education. Chen claims to have uncovered the true risk by level of schooling. Heckman and associates use a similar but more elaborate procedure, by replacing individual fixed effects in schooling choice and wage equations by a set of independent latent factors. Chen concludes that risk does not increase with level of education; her results also indicate that the contribution of heterogeneity to wage inequality is negligible relative to the contribution of risk. Mazza, van Ophem and Hartog (2013) replicate Chen’s estimates on three countries and confirm that risk dominates heterogeneity, but they also note “that the relationship between schooling and risk has not yet been reliably exposed”. By contrast, Cunha, Heckman and Navarro (2005) conclude that 30 most ex post earnings variance is not risk but was foreseen ex ante when the choice of education was made. These are strong inferences, essentially based only on the interpretation of correlations. Arcidiacono (2004) applies the same ex post methodology to differentiate ex ante and ex post returns to education, in a model of sequential decisions on participation in additional education with gradual unfolding of information on abilities. His model leads to the conclusion that 50-60% of the variance of an ability indicator related to major in university is noise. One may be sceptical about the reliability of estimating ex ante conditions from ex post data and indeed, scepticism has increased markedly in the last decade or so. As Manski noted in 2004: “Researchers performing econometric analysis of choice data often have enormous difficulty defending the expectations assumptions they maintain and, as a consequence, have similar difficulty justifying the findings they report.”(Manski, 2004, 1330). Instead of peering backwards into a black box, one may try to light up the situation by collecting information from agents in their ex ante position. This direct approach has steadily gained ground. There are now many studies that collect anticipated earnings from different schooling scenario’s in interviews14 . A common conclusion is that these expectations on average match quite well with averages observed for similar groups (e.g. graduates with the same major from the same university) and with some observed market regularities, such as effects of gender, experience, major or specialty. Williams and Gordon (1981), Kodde (1985), Smith and Powell (1990), Blau and Ferber (1991), Dominitz and Manski (1996), Menon (1997), Caravajal et al,. (2000), Nicholson and Souleles (2001, 2002), Brunello, Lucifora and Winber-Ebmer (2004), Botelho and Costa Lima (2004), Webbink and Hartog (2004), Schweri, Hartog and Wolter (2011), Mazza (2012), Arcidiacono, Hotz and Kang (2012), Diaz Serrano (2013), Wiswall and Zafar (2013), Kaufmann (forthcoming). 14 31 The variance in anticipated earnings among similar individuals is substantial. To what extent this reflects heterogeneity (i.e. conditioning on individuals’ private information) or variation in the quality of information has not been reliably established. There is quite some evidence that an individual’s expectations for own future earnings are related to the individual’s perception of actual earnings of graduates. There is also evidence that students update their beliefs on schooling outcomes when they progress through school and receive new information (Zafar, 2011). The dispersion of the perceptions is large. Wiswall and Zafar (2013) observe much heterogeneity and large errors in beliefs about market wages. Jensen (2010) notes poor information among youth in the Dominican Republic. In a rare combination of expectations and realisations, Webbink and Hartog (2004) found that students’ expectations about starting wages when entering university and their own actual starting wages after graduation correlated at only 0.06. There is some evidence that students may be very stubborn about the expectation of their own wage. Wiswall and Zafar (2013) first ask for expected own earnings after graduation and then provide accurate information on realised market values. Updates are significantly related to perception errors, in the right direction, but at the very low elasticity of about 0.08: a 1% error in perception leads only to a 0.08 % revision in the self-belief. There is no direct evidence that this stubbornness is justified by individuals’ information or perceptions on their own characteristics. Evidence on robust structural relationships between personal expectations and personal characteristics is mixed at best. Parental education, parental income, school grades and other indicators of ability often have no effect (Hartog and Diaz Serrano, 2014). Brunello, Lucifora and Winter-Ebmer (2004), surveying students in 26 economics and business faculties in 10 European countries, find that the personally expected wage premium of university over high school education is unrelated to any variable except age: not to parental background, not to channel of information about future earnings (university publication, career center, special 32 reports, press, personal communication), not to reason for choosing their selected university, not to self-assessed relative ability. Schweri, Hartog and Wolter (2011) find that deviations between own wage expectations and perceptions of realised market outcomes for graduates are not systematically related to variables that could reveal private information: neither family background nor ability (secondary school grade for math) explains the differences. Arcidiacono, Hotz and Kang (2012) note (p 6), “there is no clear pattern to the differences [in expected earnings] in majors versus non-majors”. In Webbink and Hartog (2004) some of the heterogeneity in the expectations did make sense. In regression analysis, expected earnings were negatively related to gender (female) and positively to parental income, extrinsic motivation and average grade in science. Students who had repeated class in high school expected higher earnings, which on average does not seem very likely but still was borne out in the regression for actual starting salaries (at about the same regression coefficient!). The structure of earnings by academic discipline was more or less foreseen. Nicholson and Souleles (2001) find, in their data on students in a medical college, that performance on a test of knowledge taught in the first two years has an impact on expected income after graduation. Scoring in the top quartile of this test increases expected peak income by close to 6 percent; including intended choice of medical specialty reduces this effect by 20%. Scoring in the bottom quartile has no significant effect on earnings. Kaufmann (forthcoming, Online Appendix, Table 4) finds that personally expected earnings with a high school diploma are unrelated to family background (parental education, income) and GPA in junior high school; expected earnings after college relate significantly positive to GPA15 but family background has no effect. A change of 1 standard deviation in GPA changes expected earnings by 2.8%, about 5 percent of the standard deviation of expected log high school earnings. 15 33 A few studies seek to measure perceived earnings risk of education. Kodde (1985) asked for highest and lowest income that secondary school students expected from university education, and related this to intention to enter university, with mixed results. Kaufman (forthcoming) imposes a triangular distribution on anticipated earnings and estimated parameters from asking highest and lowest expected earnings and probability to surpass the midpoint. Her approach follows Guiso, Jappelli and Pistaferri (2002); in an Italian survey they find that individuals’ coefficient of variation of expected earnings is negatively related to level of education, to health status and to risk aversion. The authors note (o .c., p 250): “In general we find the expectation variables difficult to predict on the basis of observable characteristics, as witnessed by the low R2…”. This study does not focus on education, however, but on the earnings risk faced in the coming year, conditional on not being unemployed16. Dominitz and Manski (1996) asked for median earnings and tail probabilities (with thresholds of 25% below and 25% above the median) to estimate the parameters of a lognormal distribution. The variation in dispersion across students is substantial. Respondents were also asked for their perceptions of actual distributions, and generally overestimated the interquartile range. Schweri, Hartog and Wolter (2008) apply the Dominitz and Manski method to Swiss students and find that dispersions in personally anticipated wage distributions are related to perceived actual dispersion, but barely relate to family background or school grades. Mazza (2012) also elicits individual specific earnings anticipations in the Dominitz and Manski way and finds that individual dispersions are not related to family background or self-reported ability measured as the probability to finish the attended type of secondary education. Dominitz and Manski (1997 JASA) ask households for the probability that their income over the next 12 months will fall below each of four increasing thresholds. The medians of constructed individual density functions respond as expected to common “observables” (realised income, unemployment) but the data contain no common “unobservables” (ability, effort, perseverance, etc). 16 34 What seems to emerge from the literature is the notion that individuals’ earnings expectations reflect easily observable variations, such as effects of gender, age/experience, occupation, features of education like level, discipline and major. For example, Betts (1996) finds that beliefs about wages at the national level mirror effects of gender, family background and major. It is the sort of information that is easily picked up and usually does not have to be collected by deliberate effort. On average, such expectations are not far off from the values realised in the labour market. It reminds of “The Wisdom of the Crowd”17: the distribution of estimates of some parameter from many individuals often centers on the true value. But variations among individuals are not systematic, do not indicate unobserved heterogeneity, are not related to individual characteristics, and presumably, are mostly measurement errors. Admittedly, there is some evidence or claimed evidence of supporting or confirming outcomes for coherent expectations. Arcidiacono, Hotz and Kang (2012) ask students for their abilities in 6 majors as their performance relative to their peers (at Duke) and note that students on average are found in majors where their ability is highest. One may suspect this to reflect cognitive dissonance but other studies argue that cognitive dissonance is not an important factor (Zafar, 2011, see below). They also ask for expected earnings for the chosen major, and 5 counterfactual majors. They claim conformity of choice of major with comparative advantage in earnings, as the fraction of students observed in a chosen major is highest for students for whom earnings in the chosen major are highest, compared to other majors. But support for this claim is very modest, as diagonal elements in the earnings This is the “wisdom” found in the estimation of some parameter by a large number of people. E.g. the height of the Eiffel Tower, or the distance between Paris and Berlin. The mean in the distribution of estimates is often very close to the actual value. James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations Doubleday ISBN 0-385-50386-5. 17 35 matrix (actual by potential major) are not always the highest and differcnces with offdiagonal fractions are mostly quite small. Zafar (2011b) argues that students’ expectations and assessments about their studies are internally consistent and not plagued by cognitive dissonance. Students assessment of their expected relative performance in different majors reflect the common opinion on the relative difficulty of these majors. Cognitive dissonance is tested by relating changes in assessments to actual behaviour between two surveys (i.e reporting an increase in predicted succes for an actually chosen major and a decrease for a rejected major) and by considering patterns of rounding of stated probabilities (towards more favourable for chosen and towards less favourable for rejected options). Stinebrickner and Stinebrickner (2014) focus on studying science as a major. Initially, when entering college, students are poorly informed and have far too optimistic beliefs about their potential grade point averages and their ability to complete a major in science. During their studies they accumulate better information and adjust their behaviour accordingly. S&S even conclude that the ultimate fraction of students ending up with a major in science is in line with ability for science in the entering student population. Our results also tie in with a literature expressing doubt on indivduals’ skills in dealing with probability concepts. Being informed on probabilities of uncertain events is one thing, being able to process probability information is another, and in fact, the two are related. Psychologists have developed a literature on numeracy, or probability numeracy. Pachur and Galesic (2013, p 261) refer to “the “numeracy” construct, which encompasses knowing how to perform elementary calculations with percentages as well as an understanding of stochastic processes (e.g. the concept of a random coin toss). Numeracy or probabilistic 36 numeracy varies greatly among the population, and what is in particular relevant for our study, also among the educated population. Lipkus, Samsa and Rimer (2001) analyse three samples of mostly highly educated American individuals. Among the three samples combined, only 12% has high school education or less (6, 12 and 16% in the separate samples). Yet these highly educated individuals had substantial difficultly in correctly answering simple questions on the concept of probability. “Rolling a fair dice 1000 times, how many times will it come up even? ” is correcty answered by less than 60%. Switching between percentages and frequency numbers is not done flawlessly. If A has a risk of 1% in 10 years and B has double that risk, still 10% of the respondents come up with the wrong risk for B. Pachur and Galesic (2013) used such simple questions to measure numeracy and found that less numerate participants chose the normatively better option (higher expected value) less often, guessed more and relied more on a simple risk-minimizing strategy. They also summarise results from the literature that differences in numeracy have differential effect in choices that are “affect rich” (getting an electric shock) or “affect poor” (getting a 20 dollar fine): in the affect-poor context, individuals were very sensitive to probability information, in the affect-rich context they were not, showing even probability neglect (begging the question whether education is affect-rich or affect-poor). Peters et al. (2006) find that highly numerate individuals are less susceptible to framing effects and draw more precise affective meaning from numerical information than less numerate individuals. Interestingly, numeracy correlates weakly with SAT (r=0.26) but purging numeracy from SAT scores (by using residuals) has no effect on the conclusion. Dillingh, Kooreman and Potters (2016) find a nonmonotonic relationship between probability numeracy and expenditures on health insurance in The Netherlands. We conclude that our results are in line with the dominant picture in the literature. Our study no doubt has its weaknesses, and details are open to critique. But we are convinced 37 that the key conclusion is not a consequence of particular features of our approach: differences in students’ expectation on their benefits from education cannot be explained from a consistent structure of unobserved heterogeneity. 10. Conclusions The prime result of this paper is lack of systematic variation of individuals expectations on the outcomes of education. This is in conformity with the general picture that emerges from the literature. Our working hypothesis for further work on student’s expectations will be that their basic information derives from their perceptions of actual outcomes realised in the labour market. Qualitatively these perceptions often correctly reflect easily observed regularities (e.g. ranking by gender or discipline) and the means of the distributions across individuals may be fairly close to realised means, but variations among individuals are hard to explain. Individual perceptions to a large extent are just noise. There can be no doubt that ex ante students are very uncertain about the outcomes of education. But it is not obvious how to measure that uncertainty. Forcing the anticipations to fit the mold of probability distributions requires good guidance, as many respondents are insufficiently familiar with definitions and requirements of the probability measure. We think that the best way to proceed is to focus on good documentation of student perceptions on effects of education, both in terms of average and dispersion and to acknowledge that student expectations will be mostly qualitative rather than quantitative (as in rankings). Students do not entertain anticipations in the form of probability distributions that can readily be uncovered in surveys. 38 Once we accept that there is no consistent structure in the unobserved heterogeneity, we may reflect on what information students do have. Expectations may be crude, rather qualitative (eg earnings rankings by education type may be well known) and heterogeneity in accuracy and consistency may be important. Market reactions to wages will be determined by the marginal individual, and infra-marginal individuals, enjoying sufficient rents, may not care about precise earnings: the core distinction may be between sufficient and insufficient returns, with individuals differing in critical returns; only with market returns close to the critical returns will individuals collect precise information. We may at some point reflect on these questions. And we may reflect on the findings in Stinebrickner and Stinebrickner (2014) that students set out on an education in a mist of misinformation. Precisely as formulated by Manski (1989): “students comtemplating college entrance do not know whether completion will be feasible or desirable. Hence, enrollment is a decision to initiate an experiement, one of whose possible outcomes is dropout”. We also conclude to an old methodological divide, revived by the recent rise to prominence of behavioural economics. The concept of unobserved heterogeneity, as used in econometric estimates on ex post realised outcomes may be useful for prediction (in the sense of conditonal statements), but it does not match the information set that individuals are aware of. Thus, the relevance and meaning of our results also depends on the methodological stance one prefers18. 18 See the discussions on the Lester-Machlup debate of 1946 (Blaug, 1980, ch 7). 39 References Arcidiacono, P., (2004), Ability sorting and the returns to college major, Journal of Econometrics, 121(1-2), 343-375 Arcidiacono, P., J. Hotz and S. Kang (2012), Modelling college major choice using elicited measures of expectations and counterfactuals, Journal of Econometrics ,166 (1), 3–16 Arcidiacono, P. J. Hotz, A. Maurel and T. Romano (2014), Recovering ex ante returns and preferences for occupations using subjective expectations data, Cambridge, Mass: NBER Working Paper 20626 Bailey, M. and S. Dynarski (2011), Gains and gaps: changing inequality in US college entry and completion, Cambridge, Mass: NBER Working Paper 17633 Betts, J. (1996), What do students know about wages? Evidence from an undergraduate survey, Journal of Human Resources, 31(1), 27-56 Blaug, M. (1980), The methodology of economics, Cambridge: Cambridge University Press Bleemer, Z. and B. Zafar (2015), Intended college attendance: Evidence from an experiment on college returns and cost, Bonn: IZA Discussion Papers 9445 Bonin, H., T. Dohmen, A. Falk, D. Huffman and U. Sunde (2007). Cross-Sectional Earnings Risk and Occupational Sorting: The Role of Risk Attitudes, Labour Economics, 14(6): 926-937 Botelho, A. and L. Costa Pinto (2004), Students’ expectations of the economic returns to college education: results of a controlled experiment, Economics of Education Review, 23(6), 645–653. Brunello, G., C. Lucifora and R. Winter-Ebmer (2004), The wage expectations of European business and economics students, Journal of Human Resources, 39 (4), 1116- 1142 Chen S.H. (2008), “Estimating the Variance of Wages in the Presence of Selection and Unobserved Heterogeneity”, The Review of Economics and Statistics, 90(2): 275-289. 40 Cunha, F., J. Heckman and S. Navarro (2005), Separating uncertainty from heterogeneity in lifecycle earnings, Boston: NBER Working Paper 11024 Dominitz, J. and C. Manski (1996), Eliciting student expectations of the return to schooling, Journal of Human Resources, 31, 1-26. Dominitz, J. and C. Manski (1997), Perceptions of Economic Insecurity: Evidence from the Survey of Economic Expectations, The Public Opinion Quarterly 61 (2), 261–287 Dillingh, R., P. Kooreman and J. Potters (2016), Probability numeracy and health insurance purchase, De Economist, 164 (1), 19-39 Guiso, L., T. Jappelli and L. Pistaferri (2002), An Empirical Analysis of Earnings and Employment Risk, Journal of Business and Economic Statistics, 20 (2), 241-253 Hartog, J. (1980), Earnings and capability requirements, Review of Economics and Statistics, LXII (2), 230-240. Hartog, J. (2001), On human capital and individual capabilities, Review of Income and Wealth, 47 (4), pp. 515-540 Hartog, J. (2011), A Risk Augmented Mincer earnings equation? Taking stock. Research in Labor Economics, 33, 129-173 Hartog, J. and L. Diaz Serrano (2014), Schooling as a Risky Investment: A Survey of Theory and Evidence, Foundations and Trends in Microeconomics (www.nowpublishers.com/mic), 9 (3-4) Hartog, J., H. van Ophem and S. Bajdechi (2007), Simulating the risk of investment in human capital, Education Economics, 15 (3). 259-275 Heckman, J. (2006), The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior, Journal of Labor Economics, 24 (3], 411-482 Jensen, R. (2010), The (Perceived) Returns to Education and the Demand for Schooling The Quarterly Journal of Economics, 125 (2), 515-548. 41 Kaufmann, K. (forthcoming), Understanding the Income Gradient in College Attendance in Mexico: The Role of Heterogeneity in Expected Returns, forthcoming at Quantitative Economics, The Econometric Society, Appendix Light, A. and W. Strayer (2000), Determinants of College Completion: School Quality or Student Ability?, The Journal of Human Resources, 35 (2), 299-332 Lipkus, I., G. Samsa and B. Rimer (2001), General performance on a numeracy scale among highly educated samples, Medical Decision Making, 21 (1), 37-44 Manski, C. (1989), Schooling as experimentation: a reappraisal of the postsecondary dropout phenomenon, Madison: University of Wisconsin, Institute for Research on Poverty Discussion Paper no. 865-88 Manski, C. (2004), Measuring Expectations, Econometrica, 72(5),1329-1376 Mazza, J., H. van Ophem and J. Hartog (2013), Unobserved Heterogeneity and Risk in Wage Variance: Does More Schooling Reduce Earnings Risk?, Labour Economics, 24, 323-338 Nicholson, S. and N. Souleles (2001), Physician income expectations and specialty choice, Cambridge, Mass: NBER Working 8536 O*NET: http://www.onetcenter.org/content.html/1.A#cm1 Pachur, T. and M. Galesic (2013), Strategy selection in risky choice: the impact of numeracy, affect and cross-cultural differences, Journal of Behavioral Decision Making, 26 (3), 260271 Pajares, F. (1996), Self-Efficacy Beliefs in Academic Settings, Review of Educational Research, 66 (4) 543-578 Peters, E., D. Västfjäll, P. Slovic, C. Mertz, K. Mazzocco and S. Dickert (2006), Numeracy and Decision Making, Psychological Science, 17 (5), 407-413 Schweri, J., J. Hartog and S. Wolter (2008), Do students expect compensation for wage risk?, Universität Zürich ISU, Leading House Working Paper No. 11 42 Schweri, J., J. Hartog and S. Wolter (2011), Do students expect compensation for wage risk?, Economics of Education Review, 30 (2), 215-227 Skriabikova, O. (2014), Preferences, institutions and economic outcomes: an empirical investigation, PhD Dissertation Universiteit Maastricht, Department of Economics Spinath, B., F. Spinath, N. Herlaar and R. Plomin (2006), Predicting school achievement from general cognitive ability, self-perceived ability and intrinsic value, Intelligence, 34 (4), 363–374 Stinebrickner, R. and T. Stinebrickner (2014), A Major in Science? Initial Beliefs and Final Outcomes for College Major and Dropout, The Review of Economic Studies, 81(1), 426472. U.S. Departement of Labor (1965), Dictionnary of Occupational Titles, Washington: US Government Printing Office Webbink, D. and J. Hartog (2004), Can students predict their starting salaries? Yes!, Economics of Education Review, 23 (2), 103-114 Wiswall, W. and B. Zafar (2013), Determinants of college major choice: identification using an information experiment, Review of Economic Studies, 82 (2), 791-824 Zafar, B. (2011), How do college students form expectations?, Journal of Labor Economics, 29 (2), 301-348 Zafar, B. (2011b), Can subjective expectations data be used in choice models? Evidence on cognitive biases. Journal of Applied Econometrics, 26: 520-544.111 43 Appendix A. Amsterdam survey This is a survey related to your decision to choose economics as your field of study. Your answers will be treated anonymously, and will only be used in academic research. It will take between 5 and 10 minutes to answer. 1. Do you have the Dutch nationality? Yes No 2. Do you study Economics or Business Economics at the University of Amsterdam? Economics Business Economics A. We first would like to ask you something about yourself and your background. 3. What is your gender? Male Female 4. What is your year of birth? 5. The country where you went to secondary school: 6. What is the type of city you went to secondary school? Rural town Low density urban area High density urban area 7. The year that you first entered the University of Amsterdam: 8. What is the highest educational level of your father? Primary school High School College/University 9. What is the highest educational level of your mother? Primary school High school College/University 10. Where does your parental annual family income approximately locate in your country's income distribution? Lowest 20% Between 20-40% 44 Median 20% 60-80% Highest 20% 11. What type of grading system was used for your final exams? A-F 1-10 1-6 Other, …. 12. What was your average grade at secondary school for your final exams? 13. If you had no mathematics in your final exam from secondary education: What was your last grade for math? Grade: .. Non-applicable 14. If you did have mathematics in your final exam from secondary education: What was your exam grade for math? Grade:.. Non-applicable B. You have chosen to study Economics. / B. You have chosen to study Business economics. (depending on what the respondent answered before) 15. What do you think is the probability to graduate with a Bachelor degree in three years time for an average person who started economics? 15a. What do you think is the probability to graduate with a Bachelor degree in three years time for an average person who started business economics? 16. Where would you look for a job after graduation? The Netherlands Your home country:… Elsewhere, namely:.. 17. What do you think is the median gross annual income for an economist with a Master degree, or, formerly a drs title, around age 45? Please, refer to the country where you would look for a job. (Remember: the median income is the income with 50% of the people earning more and 50% earning less than that amount.) Euro:.. If not in euro, which currency:.. If not in euro, what amount:.. 45 17a. What do you think is the median gross annual income for a business economist with a Master degree, or, formerly a drs title, around age 45? Please, refer to the country where you would look for a job. (Remember: the median income is the income with 50% of the people earning more and 50% earning less than that amount.) Euro:.. If not in euro, which currency:.. If not in euro, what amount:.. 18. Of course, not everyone will earn the median income. What fraction of those economists do you think would earn less than ¾ of that amount? 18a. Of course, not everyone will earn the median income. What fraction of those business economists do you think would earn less than ¾ of that amount? 19. What fraction of those economists do you think would earn more than ¼ above that amount? 19a. What fraction of those business economists do you think would earn more than ¼ above that amount? 20. How do you rate your own probability to graduate with a Bachelor degree in three years time? 21. And how do you rate your own probability to graduate with a Master in economics in 5 years time? [If the Master is followed elsewhere: the probability to graduate in nominal (formal) program time] 21a. And how do you rate your own probability to graduate with a Master in business economics in 5 years time? [If the Master is followed elsewhere: the probability to graduate in nominal (formal) program time] 22. Let’s assume that you will continue for a Master degree in economics. Right after graduating with that master, what do you think you will earn? As we assume that you will not know for sure what you might earn, we ask you for the median: at the median, there is 50% probability that you would earn less, and 50% probability that you would earn more. Please refer to the country where you will look for a job. My median gross annual salary would be: Euro:.. 46 If not in euro, which currency:.. If not in euro, what amount:.. 22a. Let’s assume that you will continue for a Master degree in business economics. Right after graduating with that master, what do you think you will earn? As we assume that you will not know for sure what you might earn, we ask you for the median: at the median, there is 50% probability that you would earn less, and 50% probability that you would earn more. Please refer to the country where you will look for a job. My median gross annual salary would be: Euro:.. If not in euro, which currency:.. If not in euro, what amount:.. 23. What do you think is the probability that you would earn less than ¾ of the salary you just indicated in the last question? 24. What do you think is the probability that you would earn more than ¼ above this salary? C. Now suppose you had not chosen economics/ business economics. Imagine you had chosen Medicine Biology Mathematics Psychology Sociology History Law European Studies (Dutch) Language and Literature Art History (survey chooses randomly one of the above) 25. Have you actually considered this study as an option? Yes No 26. How would you rate your own probability to graduate in three years with a Bachelor from that (rejected) study? 27. How would you rate your own probability to graduate in five years time with a Master of that same study? 47 28. Let’s suppose you had graduated with a Master from that study. Right after graduation from that study with a Master degree, what do think your median gross annual salary would be? As before, please refer to the country where you would look for a job. Euro:.. If not in euro, which currency:.. If not in euro, what amount:.. 29. What do you think is the probability that you would earn less than ¾ of the salary you just indicated? 30. What do you think is the probability that you would earn more than ¼ above this salary? D. Now suppose you had not started a university education at all, but had gone to work right after graduating from secondary school. 31. Right after graduation from secondary education, what do think your median gross annual salary would have been? Euro:.. If not in euro, which currency:.. If not in euro, what amount:.. 32. What do you think is the probability that you would earn less than ¾ of this salary you just indicated? 33. What do you think is the probability that you would earn more than ¼ above this salary? 34. We now ask you to rank yourself relative to other persons of your age who have graduated from secondary education. We ask for your percentile position on four abilities: mathematical ability, verbal ability (ability to express yourself articulately in your own language, ability to learn foreign languages), social ability (feel yourself at home with other people, in groups, make contact, make friends) commercial ability (the ability to convince other people to “buy something from you”, whether a product, a conviction, a political stand, etc). Mathematical ability Verbal ability Top 5% 48 Social ability Commercial ability Top 10% Top 25% Top-Half Right in the middle Lower 25 - 50% Lowest 25% Lowest 10% Lowest 5% 35. How do you see yourself: Are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a box on the scale. Risk preparedness: 1 2 3 Unwilling To take any risk 4 5 6 7 8 9 10 Fully prepared to take risks 36. Among 10 persons, a lottery is organized where one person can win 100 euro. It's a fair lottery and each person has 10% probability to win the prize. How much would you be willing to pay at most for a ticket in this lottery? 37. You have chosen to study economics. A choice like that can be based on several arguments. Please specify the weight that each argument below had in your decision to study economics. Please make sure that your weights add up to 100. I like the type It fits my of work abilities Earnings Other Total Weight 37a. You have chosen to study business economics. A choice like that can be based on several arguments. Please specify the weight that each argument below had in your decision to study economics. Please make sure that your weights add up to 100. 49 I like the type It fits my of work abilities Earnings Other Total Weight 38. For our academic research it will be interesting to relate your answers to your exam results. We repeat: analysis of data will be anonymous and only serve the purpose of pure academic research. Do you give permission to link your UvA study records to this survey? If so: please state your student number: Thank you very much for your cooperation 50 APPENDIX B. Sample compositions A: the Faculty of Economics and Business (FEB), University of Amsterdam In Amsterdam, surveys were administered in class, at different points in the academic year 2013-2014 among three groups of students: ECS3 : Course Econometrics, third year BSc Economics and Business. Students of the specialisations “Economics” and “Business” participated. This bachelor is taught in English and non-Dutch students participated. ECTR1:Course Introductory Econometrics, first year BSc, Econometrics and Actuarial Science, Dutch students. It’s a population with better quantitative skills. ECS1 :Research Project, first year BSc Economics and Business, Dutch and international students Sample size and composition Amsterdam (A) Class size Response Response rate (%) Foreign ECS3 350 232 66.2 55 ECTR1 100 81 81.0 0 ECS1 600 89 14.8 22 402 38.2 TOTAL “Foreign” means that the student had secondary education outside The Netherlands. The University of Amsterdam makes a deliberate effort to attract foreign students, and most courses are also offered in English. B: University of the Balearic Island, Faculty of Economics and Business The UIB data are collected at the University of the Balearic Island, Faculty of Economics and Business and the degrees studied are “Degree in Economics” and “Degree in Business Administration”. The survey is done in the first week of the course Analysis of Economic Data and answering the survey is mandatory to later participate in an individual assignment which yields 15% of the assessment of the course. The survey was administered in February 2013 among 431 first year students from the Economic program, and the Business and Administration program. The survey was done in the computer lab during class hours; as answering the survey was mandatory to participate in an assignment, non-response is not an issue. 51 P: Peking University, School of Economics and Guanghua School of Management, The PKU survey is conducted in Peking University, School of Economics and Guanghua School of Management. The degrees studied include “Economics”, “Public Finance”, “Finance”, “International Trade”, “Management Science and Engineering”, “Business Administration” , “Public Management”. The survey is administered in the last four weeks of the second semester in 2013. Students complete online surveys by replying on electronic invitations to their email account. PKU is considered a top university in China, with highly selective admission, based on scores in a national university admission exam (with some provincial variations). 234 students participated, 161 in their first year and 72 in their fourth year. Participation was voluntary, with a lottery for a small reward (a phone card worth 7 Euros), won by 15% of the participants. The response was 28.8% of the invitations19. Sample composition Peking (P) Major Numbers Economics 107 Business Administration 59 Law 35 Science 24 Humanities 8 We also ask a couple of questions about students’ career choices after graduation and their preferences for financial aid packages during graduate study. We added some questions about students’ performance in college entrance examination tests and standardized English Proficiency Tests administrated by the Ministry of Education, in order to verify student’s self- reported ability in mathematical and verbal ability. R: Faculty of Economics and Business, Rovira i Virgili University, Reus The URV survey has been carried out in the Rovira i Virgili University, Faculty of Economics and Business. The Faculty is located in Reus, a middle sized city with about 120,000 inhabitants, located 100 km to the south of Barcelona. The degrees studied are “Economics”, “Business Administration” and “Accounting and Finance”. The survey is done two weeks before the end of the first quarter of the academic year, August 2013, and the students fill in the questionnaires in the class of Statistics I and II, with all students present participating. We have 445 responses, divided over four study years: 1 (179), 2 (190), 3 (47) and 4 (17) (for 12 students, study year is not known). The students work for three different degrees: 19 We repeated the on-line version of the same survey as a paper-and-pencil version in another 6 universities at different selectivity levels, and retrieve 1986 questionnaires. Since these universities are not as selective as Peking University, we analyze in the present paper only students from Peking University. The other data will be analysed in a separate paper. 52 Sample composition Reus (R) Degree Spaniards Economics Foreign Total 28 8 37 Management 248 29 280 Finance 108 19 128 Total 385 56 445 NB: Numbers do not add up because of missing information In Amsterdam, an alternative study was randomly assigned, from a list we had drawn up (see the Amsterdam list in the survey, Appendix A). In Peking, students were asked to choose their preferred alternative major from the list of 44 majors offered at PKU; the list was restricted to studies offered at the same university, to maintain admission to the same highly selective university20. In Spain, when students want to choose a study in a public university they are asked to list 8 studies ranked by preference. The grades of the university admission test averaged with the overall high school grades is used by the Spanish government to rank students in order to give priority to the best students to their first choice. In many cases, students that performed poorly may end up in the study they ranked in fourth or fifth position. In Reus and Baleares, the alternative study is a random draw from the 7 remaining options. In URV the random assignment of alternative studies was done according to the first letter of the student’s surname. We assigned the alternative studies to the different letters in a way that each study had the same probability to be randomly assigned. The list of alternative studies is the following: Surname first letter Study A–B C–D E–F G–L M N–Q R S T–Z Medicine Biology Law Psychology Sociology History Mathematics Philology Art 20 Two students choose their present major, instead of an alternative one. 53 Table 2. Peking regressions 2013 2014 Mean Dispersion Probability Mean Dispersion Probability earnings graduate in earnings graduate in expectation <=4 years expectation <=4 years Female -0.091 0.564 -0.768 -0.154* -3.719* 2.119 (0.105) (2.752) (2.144) (0.060) (1.471) (1.190) Low urban density (<50000) 0.214 -2.262 1.045 -0.006 -1.051 -0.078 (0.127) (3.289) (2.561) (0.072) (1.776) (1.436) High urban density (>50000) 0.212 0.733 1.303 -0.002 0.064 -0.732 (0.135) (3.483) (2.709) (0.074) (1.818) (1.470) Father secondary education 0.350 20.263** -6.136 0.131 2.883 -0.645 (0.288) (7.598) (5.900) (0.128) (3.158) (2.557) Father higher education 0.343 18.691* -6.186 0.126 0.626 -0.299 (0.301) (7.926) (6.169) (0.135) (3.340) (2.704) Family income (<=decile 2) -0.015 -8.081 -2.164 -0.112 1.754 1.234 (0.224) (5.745) (4.474) (0.110) (2.707) (2.190) Family income (>decile 2 <=decile 0.111 -3.97 0.246 -0.097 0.135 0.314 4) (0.215) (5.498) (4.280) (0.108) (2.657) (2.150) Family income (>decile 6 <=decile 0.384 -9.107 -0.383 0.009 -1.708 -1.814 8) (0.226) (5.829) (4.535) (0.132) (3.242) (2.624) Family income (>decile 8) 1.148* -16.181 3.333 0.339 3.465 -10.662** (0.523) (11.724) (9.121) (0.196) (4.881) (3.923) Grades in high school 0.010 0.019 0.088 0.219 8.098 -2.290 (0.007) (0.187) (0.146) (0.277) (6.854) (5.547) Maths grade in high school -0.070 -1.515 -0.087 -0.055*** -0.505 1.960*** (0.036) (0.942) (0.733) (0.016) (0.401) (0.322) Maths ability 0.058 3.126** -2.241* 0.015 -0.864 0.498 (0.044) (1.128) (0.877) (0.021) (0.514) (0.413) Verbal ability 0.055 0.473 0.198 -0.006 -0.396 1.127* (0.045) (1.126) (0.877) (0.022) (0.541) (0.438) Social ability -0.013 -2.057* 0.877 0.050* 0.209 -0.082 (0.039) (1.024) (0.797) (0.020) (0.490) (0.397) Commercial ability -0.039 -0.563 -0.039 0.020 0.215 -0.388 (0.025) (0.620) (0.482) (0.014) (0.335) (0.271) Willingness to take risks 0.001 0.002 0.076 -0.001 0.029 0.069* (0.003) (0.083) (0.064) (0.002) (0.037) (0.030) I like type of job I can get -0.001 0.008 0.056 0.001 0.027 0.112* (0.004) (0.094) (0.073) (0.002) (0.054) (0.044) Fits my capabilities 0.002 0.201* -0.069 0.004 0.113 0.037 (0.003) (0.078) (0.060) (0.003) (0.061) (0.049) Earnings I can have after graduation -0.091 0.564 -0.768 -0.154* -3.719* 2.119 (0.105) (2.752) (2.144) (0.060) (1.471) (1.190) Dummy tail probabilities 39.155*** 74.752*** (2.890) (5.925) Constant 10.110*** 48.832* 94.043*** 11.215*** 32.147*** 64.859*** (0.718) (18.856) (14.668) (0.210) (5.155) (4.174) 2 0.140 0.554 0.100 0.079 0.273 0.206 R Observations 195 202 202 553 560 560 Notes: The earnings expectations refer to the students’ current studies. The excluded category for urban density is rural. In 2013 abilities are measured as a quantitative variable, while in 2014 the abilities are measured with a dummy variable indicating above median. The following variable was also included, but coefficients are not included in the table: Age. Standard errors can be found in parentheses, * p<0.05, ** p<0.01, *** p<0.001 54 Table 3. Amsterdam regressions 2013 2014 Mean Dispersion Probability Mean Dispersion Probability earnings graduate in earnings graduate in expectation <=4 years expectation <=4 years Female 0.018 0.011 0.047 -0.163* 0.034 0.062 (0.045) (0.079) (0.031) (0.029) (0.069) (0.040) Low urban density (<50000) 0.070 -0.008 -0.010 0.013 -0.040 0.014 (0.111) (0.045) (0.042) (0.101) (0.059) (0.067) High urban density (>50000) -0.015 0.007 0.031 0.004 0.002 -0.055 (0.079) (0.032) (0.029) (0.071) (0.042) (0.047) Father secondary education 0.044 0.010 -0.097 -0.019 0.013 -0.008 (0.238) (0.095) (0.089) (0.184) (0.108) (0.121) Father higher education -0.029 -0.003 -0.063 0.014 0.047 0.027 (0.238) (0.095) (0.088) (0.178) (0.104) (0.117) Family income (<=decile 2) -0.041 0.113 -0.061 0.008 -0.032 -0.084 (0.263) (0.105) (0.098) (0.155) (0.090) (0.102) Family income (>decile 2 <=decile -0.030 -0.093 0.082 -0.005 -0.022 0.038 4) (0.179) (0.072) (0.067) (0.139) (0.082) (0.092) Family income (>decile 6 <=decile -0.054 0.041 0.014 -0.017 0.020 -0.056 8) (0.092) (0.037) (0.034) (0.075) (0.044) (0.050) Family income (>decile 8) 0.128 0.015 0.019 -0.034 -0.049 -0.062 (0.101) (0.041) (0.037) (0.087) (0.051) (0.057) Grades in high school 0.197** -0.005 0.004 -0.025 -0.009 0.022 (0.065) (0.026) (0.024) (0.050) (0.029) (0.033) Maths grade in high school -0.083 0.016 0.010 -0.020 0.006 0.003 (0.042) (0.017) (0.016) (0.035) (0.020) (0.023) Maths ability 0.066* -0.003 0.032** 0.009 -0.010 0.045** (0.031) (0.012) (0.012) (0.021) (0.013) (0.014) Verbal ability -0.021 -0.001 0.021 0.031 -0.010 -0.001 (0.033) (0.013) (0.012) (0.021) (0.012) (0.014) Social ability -0.035 -0.002 0.005 0.026 -0.001 -0.002 (0.032) (0.013) (0.012) (0.020) (0.011) (0.013) Commercial ability 0.037 0.014 -0.019 0.004 -0.008 0.021 (0.028) (0.011) (0.010) (0.021) (0.012) (0.014) Willingness to take risks 0.044 0.019 0.000 0.013 0.010 -0.005 (0.027) (0.011) (0.010) (0.021) (0.013) (0.014) I like type of job I can get 0.002 0.001 0.001 0.001 0.002 0.004* (0.003) (0.001) (0.001) (0.002) (0.001) (0.002) Fits my capabilities -0.000 0.001 0.001 -0.003 0.002 0.003 (0.003) (0.001) (0.001) (0.003) (0.002) (0.002) Earnings I can have after graduation -0.001 0.001 0.002 0.004 0.002 0.002 (0.003) (0.001) (0.001) (0.003) (0.002) (0.002) Dummy tail probabilities 0.483*** 0.546*** (0.079) (0.110) Constant 8.977*** 0.288 0.330 9.584*** 0.739* -0.483 (0.608) (0.244) (0.228) (0.603) (0.352) (0.397) 0.785 0.183 0.165 0.364 0.219 0.176 R2 Observations 304 301 302 233 232 233 Notes: The earnings expectations refer to the students’ current studies. The excluded category for urban density is rural. In 2013 abilities are measured as a quantitative variable, while in 2014 the abilities are measured with a dummy variable indicating above median. The following variables were also included, but coefficients are not included in the table: Non-native born, age. Standard errors can be found in parentheses, *** p<0.01. ** p<0.05. * p<0.1. 55 Table 4. Baleares regressions 2013 2014 Median Dispersion Probability Mean Dispersion Probability earnings graduate in earnings graduate in expectation <=4 years expectation <=4 years Female -0.074* -2.960 -1.342 -0.069 1.044 -3.795 (0.043) (2.296) (3.143) (0.052) (1.866) (3.538) Low urban density (<50000) 0.026 -4.676 0.916 -0.056 3.996 -2.170 (0.056) (2.991) (4.100) (0.070) (2.489) (4.711) High urban density (>50000) 0.050 -5.264* -2.308 0.029 0.684 0.031 (0.056) (2.961) (4.065) (0.067) (2.393) (4.531) Father secondary education 0.047 -4.759* 5.375 0.044 -0.745 -4.943 (0.051) (2.772) (3.774) (0.062) (2.218) (4.199) Father higher education 0.032 -7.743** 5.221 -0.030 4.093 -1.209 (0.060) (3.241) (4.449) (0.074) (2.648) (5.028) Family income (<=decile 2) -0.048 -4.197 -3.301 0.079 -0.271 -1.434 (0.065) (3.490) (4.717) (0.075) (2.637) (4.991) Family income (>decile 2 <=decile -0.077 0.000 -3.600 0.007 4.742** -2.404 4) (0.051) (2.767) (3.762) (0.062) (2.214) (4.204) Family income (>decile 6 <=decile 0.025 -1.048 1.570 0.094 0.268 3.115 8) (0.069) (3.624) (4.940) (0.071) (2.542) (4.813) Family income (>decile 8) 0.040 -5.928 -24.710** -0.115 1.809 18.596* (0.161) (8.761) (11.904) (0.160) (5.767) (10.915) Grades in high school 0.034 -0.616 5.108*** 0.002 1.342 4.766*** (0.021) (1.122) (1.528) (0.024) (0.868) (1.642) Maths ability -0.023* -0.822 2.338*** -0.060 -3.539* 3.466 (0.012) (0.635) (0.866) (0.052) (1.831) (3.479) Verbal ability -0.008 0.205 0.935 0.095* -1.956 4.213 (0.013) (0.709) (0.967) (0.057) (2.022) (3.829) Social ability 0.020 -0.224 0.033 0.064 1.795 3.051 (0.012) (0.660) (0.900) (0.058) (2.058) (3.895) Commercial ability 0.024** -0.209 -0.604 0.082 0.121 0.281 (0.011) (0.627) (0.839) (0.062) (2.203) (4.173) Willingness to take risks 0.029** 0.733 2.283** 0.003 0.072 -0.650 (0.013) (0.676) (0.915) (0.015) (0.530) (1.004) I like type of job I can get -0.003 -0.013 -0.076 -0.003 -0.068 0.127 (0.002) (0.115) (0.157) (0.002) (0.070) (0.135) Fits my capabilities -0.001 0.069 0.056 -0.004 0.154* 0.161 (0.002) (0.125) (0.170) (0.002) (0.081) (0.156) Earnings I can have after graduation 0.000 0.078 -0.288 0.001 0.002 0.112 (0.002) (0.134) (0.182) (0.002) (0.085) (0.163) Dummy tail probabilities 51.326*** (2.455) Constant 9.448*** 74.706*** 36.379* 9.559*** 20.850* 28.852 (0.287) (15.013) (20.604) (0.334) (11.933) (22.689) 0.200 0.648 0.303 0.144 0.101 0.174 R2 Observations 303 312 313 294 300 299 Notes: The earnings expectations refer to the students’ current studies. The excluded category for urban density is rural. In 2013 abilities are measured as a quantitative variable, while in 2014 the abilities are measured with a dummy variable indicating above median. The following variables were also included, but coefficients are not included in the table: Non-native born, age, field in high school (three dummy variables), repeating course. In 2013, first column, a dummy for rescaled monthly to annual earnings expectation was also included. Standard errors can be found in parentheses, *** p<0.01. ** p<0.05. * p<0.1. 56 Table 5. Catalonia (URV) regressions 2013 2014 Median Dispersio Probability Mean Dispersion Probability earnings n graduate in earnings graduate in expectation <=4 years expectation <=4 years Female Low urban density (<50000) High urban density (>50000) -0.025 (0.044) 0.043 (0.061) 0.914 (3.318) -1.964 (4.635) -3.264 (4.629) -2.593 (3.864) -7.395 (4.968) -5.611 (7.699) -4.339 (4.120) -0.769 (4.179) 3.816 (14.290) 4.998*** (1.521) 9.615** (3.712) -3.521 (3.801) 0.168 (4.027) 6.480 (4.082) 0.304 (1.004) -0.077 (0.102) 0.009 (0.095) 0.013 (0.101) 0.066 (0.066) 0.020 (0.094) 0.127 (0.092) 0.004 (0.076) -0.007 (0.090) -0.039 (0.146) -0.015 (0.088) 0.108 (0.095) -0.021 (0.238) 0.016 (0.031) 0.179** (0.073) -0.103 (0.076) 0.040 (0.081) 0.065 (0.078) -0.014 (0.020) 0.001 (0.002) 0.001 (0.002) 0.003 (0.002) 6.813*** (0.321) 0.135 1.793 (2.714) 2.176 (3.812) 4.314 (3.804) 3.031 (3.155) 2.952 (4.079) -22.440*** (6.453) 5.556* (3.181) 0.273 (3.862) -3.471 (12.440) 2.244* (1.254) 2.271 (3.040) -4.634 (3.116) 2.397 (3.314) -2.615 (3.371) -0.512 (0.819) 0.021 (0.085) 0.076 (0.077) -0.050 (0.083) 52.360*** (4.187) 88.450*** (19.810) 0.421 37.700 (24.010) 0.173 332 338 338 Father higher education 0.163*** (0.061) -0.035 (0.051) 0.027 (0.066) Family income (<=decile 2) 0.084 (0.904) Family income (>decile 2 <=decile 4) Family income (>decile 6 <=decile 8) Family income (>decile 8) -0.077 (0.056) 0.072 (0.638) Father secondary education Maths ability -0.141 (0.193) 0.038* (0.020) 0.001 (0.001) Verbal ability 0.000 (0.001) Social ability -0.001 (0.001) 0.001 (0.001) Grades in high school Commercial ability Willingness to take risks I like type of job I can get Fits my capabilities Earnings I can have after graduation -0.001 (0.013) -0.001 (0.001) 0.002* (0.001) 0.003** (0.001) Dummy tail probabilities Constant R2 Observations 10.220** (4.295) 4.391 (6.124) 4.051 (5.979) -2.065 (4.947) 1.111 (5.859) -3.218 (9.462) -7.588 (5.947) 2.491 (6.230) 7.150 (15.510) 4.181** (1.998) 5.644 (4.751) -1.789 (4.934) -8.467 (5.260) 0.043 (5.049) 2.882** (1.322) -0.080 (0.148) -0.102 (0.124) 0.057 (0.158) 9.388*** (0.419) 0.126 -1.670 (2.796) 0.526 (3.920) -0.581 (3.842) 1.732 (3.195) 2.083 (3.931) 6.854 (6.101) -4.854 (3.744) -5.290 (4.612) -2.894 (9.100) 0.967 (1.279) -1.693 (3.078) -3.142 (3.160) 7.053** (3.429) 3.414 (3.262) 0.416 (0.871) 0.023 (0.096) 0.069 (0.081) 0.039 (0.101) 64.290*** (18.630) -1.093 (20.480) 0.206 221 208 222 32.290 (27.230) 0.224 Notes: The earnings expectations refer to the students’ current studies. The excluded category for urban density is rural. In 2013 abilities are measured as a quantitative variable, while in 2014 the abilities are measured with a dummy variable indicating above median. The following variables were also included, but coefficients are not included in the table: Not Spanish, age, field of study, year of study, field in high school. Earnings are annual. Standard errors in parentheses; *** p<0.01, ** p<0.05, * p<0.1. 57