UNIVERSITAT ROVIRA I VIRGILI DEPARTAMENT D’ECONOMIA  WORKING PAPERS Col·lecció “DOCUMENTS DE TREBALL DEL DEPARTAMENT D’ECONOMIA - CREIP” Judgement and Ranking: Living with Hidden Bias António Osório Document de treball n.24- 2016 DEPARTAMENT D’ECONOMIA – CREIP Facultat d’Economia i Empresa UNIVERSITAT ROVIRA I VIRGILI DEPARTAMENT D’ECONOMIA  Edita: Departament d’Economia www.fcee.urv.es/departaments/economia/publi c_html/index.html Universitat Rovira i Virgili Facultat d’Economia i Empresa Av. de la Universitat, 1 43204 Reus Tel.: +34 977 759 811 Fax: +34 977 758 907 Email: sde@urv.cat CREIP www.urv.cat/creip Universitat Rovira i Virgili Departament d’Economia Av. de la Universitat, 1 43204 Reus Tel.: +34 977 758 936 Email: creip@urv.cat Adreçar comentaris al Departament d’Economia / CREIP ISSN edició en paper: 1576 - 3382 ISSN edició electrònica: 1988 - 0820 DEPARTAMENT D’ECONOMIA – CREIP Facultat d’Economia i Empresa Judgement and Ranking: Living with Hidden Bias Ant´nio Os´rioX o o : Universitat Rovira i Virgili (Department of Economics) and CREIP (antonio.osoriodacosta@urv.cat). Abstract The complexity and subjectivity of the judgement task conceals the existence of biases that undermines the quality of the process. This paper presents a weighted aggregation function that attempts to reduce the influence of biased judgements on the final score. We also discuss a set of desirable properties. The proposed weighted aggregation function is able to correct the “nationalism bias”found by Emerson et al. (2009) in the 2000 Olympic Games diving competition and suggest the possibility of a “reputation bias”. Our results can be applied to judgement sports and other activities that require the aggregation of several personal evaluations. Keywords: Weighted aggregation function; Judgement by Grading; Nationalism Bias; Reputation Bias; Bias Correction; Olympic Games. JEL classification: D72, D81, D03. 1. Introduction The functioning of our society frequently relies on evaluations and rankings about objects, performances or issues. These are made by qualified but potentially [conscious or unconscious] biased judges. Such is possible because the task is complex and inherently open to subjectivity and manipulation. The solution usually involves the aggregation of several evaluations into a single score. The list of situations that require ranking based personal evaluations is endless. It includes wines, books, films and music contests, scientific refereeing or any kind of talent competition, which seem to be object of different sources of bias.1 Nowadays, 1 These considerations are particular relevant in judgement sports as gymnastics, diving, skating, boxing, surfing or dressage, among others. In many websites users are asked to anonymously (or not) rate every kind of items, from touristic places to blogs comments. Preprint submitted to unknown September 28, 2016 internet is turning these evaluation procedures even more common. However, recent controversies have spurred researchers to explore bias in subjective judgments in more detail.2 Similarly, the list of behavioral biases is enormous. For instance, “nationalism bias” indicates a [usually conscious] tendency for judges to favor own country athletes (Emerson et al., 2009; Zitzewitz, 2006). On the other hand, “reputation bias” corresponds to a general [usually unconscious] tendency for judges to be influenced by the athlete’s reputation (Kingstrom and Mainstone, 1985; Findlay et al., 2004; among others).3 These bias are not merely an issue prevalent in subjective personal evaluations but inherent to every dimension of life (Buchanan et al., 1998). Tversky and Kahneman (1974) and Kahneman and Tversky (1996) describe them as shortcuts strategies to complex process information. In addition to the limitations associated with subjective judgments, difficulties also arise at the aggregation stage. These issues dates back to Borda (1784), Condorcet (1785) and Laplace (1820), see Balinski and Laraki (2010). Range voting aggregation schemes are frequently used; voters rate each candidate with a grade within a specified interval. The candidate with the highest sum or average is the winner. The method has interesting properties and passes certain generalizations of the Arrow (1950) impossibility theorem but still open to strategic manipulation. This is actually its main limitation. Often a truncation is used to remove extreme scores and mitigate potential bias. In this respect, majority judgment (Balinski and Laraki, 2007; Balinski and Laraki, 2011) ranks candidates by the median score.4 They show that among the existing aggregation mechanisms it is the one that best resists manipulation and reduces the incentives to exaggerate. However, excessive truncation may leads to a loss of information and expressiveness that characterizes these score voting type schemes. For instance, in a typical population ranging from three to five appraisals, the removal of the highest and the lowest scores corresponds to an important loss of information. In particular, if bias is only a possibility. 2 In this respect, some statistical based rating procedures have shown better results than expert opinions that are affected by several types of behavioral limitations (Dawes et al., 1989; Meehl, 1954). Other inconsistencies and paradoxical observations are reported in Ashenfelter and Quandt (1999), Fritz et al. (2012) and Hodgson (2008). A further development of these behavioral factors is behind the scope of the present paper. 3 Other examples are “rank order bias” (Ginsburgh and Van Ours, 2003; among others) or “outlier aversion bias” in which judges avoid rating far from the mean of other judges (Lee, 2008). 4 A particular case is Brams and Fishburn (1978) approval voting method in which only two different votes may be submitted. 2 The well-known Gibbard (1973) and Satterthwaite (1975) theorems point for the impossibility of designing one system that prevents all forms of bias.5 The present paper acknowledges this limitation, but simultaneously proposes a practical aggregation mechanism that attempts to reduce the bias influence on the final ranking. In brief, our argument is the following. If a judge [conscious or unconscious] favors (respectively, penalizes) a particular candidate, she must be grading above (respectively, below) the mean score of the other judges. Therefore, the more (respectively, less) a score deviates from the others choices; lower (respectively, higher) must be its influence in the final score. We do not remove any particular score, instead, we use the information contained in the other judges grades to reduce the relevance of this particular score. Technically, the weight given to a particular judge is proportional to other judges’ distance from the arithmetic mean grade. The formalization is done in Section 2. Our proposal is a refinement of the existent procedures. It does not dispense the complimentary and simultaneous use of transparency policies, as for instance, the public disclosure of each judge score, which are particularly powerful and simple antibias mechanisms. Note that even if judges have reputation concerns or exist outside monitoring, conscious bias does not disappear. Instead, it becomes more strategic and subtle, moving inside narrower intervals, making detection even more difficult. The aggregation mechanism in the present paper is a solution to these cases. In Section 3 we state and show a set of desirable properties. Finally, in Section 4 we conclude with an illustrative application to the 2000 Olympic Games diving competition. We show that the weighted aggregation function is able to correct the “nationalism bias” found by Emerson et al. (2009). We also point the possibility that the athletes in question have simultaneously benefited from “reputation bias”. 2. The weighted aggregation function In general there is no evidence to claim whether a particular judgement is biased or not. Therefore, we would like to have an aggregation mechanism that benefits from the expressiveness of the ranking type schemes, but simultaneously deals with the impossibility of strategy-proof-in-ranking and the existence of effective manipulation 5 Another strand of the literature (Miller et al., 2005; Prelec, 2004; among others) proposes methods for eliciting truthful scoring in situations where objective truth is unknowable, see Schlag et al. (2015) for a survey. However, the operational applicability of these methods is limited by the prior informational requirements, assumptions and knowledge about individuals’ true objective functions. 3 or bias. In other words, we want a mechanism that reduces the impact of extreme scores while minimizing the loss of information. Another difficulty is the fact that judges’ preferences are private information and impossible to determine. Therefore, the aggregation function must depend only on what can be known [in practice], which in many cases is very little. With these objectives in mind, in what follows, we describe the aggregation mechanism suggested in the present manuscript. Let si,j € rS¡ , S  s € R be the grade or score award by judge j € J  t1, ..., nu to competitor or candidate i € I  t1, ..., mu . Let si  psi,1 , ..., si,n q denote the vector of grades award to individual i € I. The competitor i € I arithmetic mean grade ° 1 (over all judges j € J) is denoted and defined as si  n n1 si,j . The weight given j to the grade award by judge j € J to competitor i € I is a function, denoted as wi,j : rS¡ , S  sn Ñ r0, 1s , and defined as, wi,j psi q  ° °n | $°si,l ¡ si | pn ¡ 1q n1 |si,l ¡ si|α , l α l j (1) where n1 wi,j  1, and α ¥ 0 controls for how much penalized are the grades that j are far from the mean. In other words, the weight given to the grade of judge j € J depends on the grades of all other l $ j € J judges and on their distances from the arithmetic mean. Our objective is to penalize more the largest grade deviations from the mean, the ones that are most likely to be biased. However, a value of α that is too large can be problematic because bias is only a possibility and, in some cases, can be dissimulated away from the most extreme values. On the other hand, a low value of α may not penalize enough extreme values.6 Given a competitor i € I, the Weighted Aggregation Function (WAF) weights by wi,j the grade si,j of judge j € J. We denote this function as sw : rS¡ , S  sn Ñ i rS¡, S s , and define it as, ¸n sw psi q  wi,j si,j , (2) i  j 1 for all i € I. In case of a tie between competitors the winner is the one with less dispersion around sw . Otherwise, ties are broken through a lottery. i 6 Our notion of distance does not correspond to a metric as it does not satisfy the triangle inequality, except for α ¤ 1. 4 Note that the particular cases n  1 and n  2 are consistent. In the latter, we always have wi,1  wi,2  1{2, i.e., there is not enough information to make considerations regarding whether one judge is more or less correct than the other. The case n  1 is trivial, i.e., sw  si,1 . Another consistent particular case is when i all grades coincide, in this case we have equal weighting, i.e., wi,j  1{n for all j € J, and sw  si  si,j .7 i In practice, it is natural to expect that the WAF correction mechanism is common knowledge among the judges. This knowledge allows consciously biased judges to strategically readjust their grades. However, in order to introduce bias in the WAF, the judges are forced to incur in more extreme behaviors, i.e., they must award grades that are more extreme than in the case in which the aggregation function is the arithmetic mean, for instance. This behavior exposes them to easier detection by third party monitoring. For that reason, the WAF in the present paper does not dispense public monitoring and the use of transparency policies as the public disclosure of the grade given by each judge.8 3. Properties The proposed WAF satisfies the six axioms of the basic model, as defined by Balinski and Laraki (2007).9 Table 2 below resumes this information. In brief, neutrality and anonymity mean that any permutation of competitors and judges, respectively, does not affect the final grade. Unanimity means that if all judges award the same grade the final grade must be that grade. The symmetry property In this case, if si,j  si for all j € J, we obtain an indetermination of the form 0{0. Since the numerator and the denominator of wi,j are differentiable in some open interval around si , equal weighting emerges after applying L’Hˆpital’s rule. o 8 The WAF defined in (2) with weights given by (1) is equivalent to the aggregation function n 1 n sw  j 1 wi,j lj si,l {pn ¡ 1q, with weights given by, i 7 ° ° 1 wi,j α  °n|si,js¡ si | s |α , | ¡  l 1 i,l (3) i for all i € I and j € J. Under this interpretation the intuition is reversed; the grade of a biased judge increases the relevance given to the grades of the other n ¡ 1 non-biased judges. The reader is free to consider this alternative but equivalent formulation. 9 Felsenthal and Machover (2008) provide a critical discussion on some additional properties of majority judgement and range voting, see also Balinski and Laraki (2014). We note that in some cases a direct transposition of some classical properties into ranking aggregation schemes is of interest but may ignore the contextual reality (Chebotarev and Shamis, 1998). 5 presented below is a more demanding generalization of unanimity, the implied equal treatment of every grade imply neutrality and anonymity. Monotonicity means that a higher grade cannot make the competitor in question worse-off. We will show that monotonicity holds in most cases. The exception occurs for large deviations from the mean and when these deviations are heavily punished, i.e., through a large value α. This property is discussed in more detail below. Independence of irrelevant alternatives means that the grades award to other competitors cannot affect directly the final grade of a given competitor. Continuity has the usual mathematical meaning. The proof of some of these properties is simple and follows from the definitions of weights in (1) and WAF in (2). In this section, we state, prove and comment an additional set of desirable properties. The aggregation function must weight equally grades that are at the same distance below and above the mean. Monitoring is achieved by considering deviations from the mean. Moreover, the WAF must return the arithmetic mean if there exist full symmetry between grades. A particular case is when all scores coincide. Property 1 (symmetry). If si,j ¡ si and si,k   si such that si,j ¡ si then wi,j  wi,k . Moreover, if this is true for all j, k € J, then sw  si . i  si ¡ si,k , This property implies an equal treatment for the grades that are below and above the mean. The satisfaction of this property is crucial because in general we cannot tell whether a particular grade is affected from bias, or whether this bias is below or above the mean. Moreover, a biased judge may benefit a particular competitor in several ways: directly through positive bias, indirectly through negative bias in the opponents of this competitor, or a combination of both. Therefore, we must be equally attentive to both sides of the mean. The equal treatment of grades stated in the first part of Property 1 implies neutrality and anonymity. The second part of Property 1 implies unanimity. Note that these are three of the axioms of the basic model (Balinski and Laraki, 2007). Once we have numerical grades we should be able to perform basic arithmetic transformations without changing their meaning. For instance, the distance between two points on a number line should not change by adding the same quantity to both numbers. The multiplication of two points on a number line does not have this property; distance is not invariant under multiplication. However, since the proposed weight function (1) is a ratio between sums of distances it is invariant under 6 multiplication. Consequently, the weights become invariant to linear operations in the grades. Definition (scaling and translation invariant). The weights are scaling and translation invariant if wi,j pφpsi qq  wi,j psi q for all i € I and j € J, where φpsi q  a   bsi with a € R, b € R  and si  psi,1 , ..., si,n q. Property 2 (scaling and translation invariant). The weights are scaling and translation invariant. In other words, the weights are invariant to affine transformations of all grades. This property is passed to the WAF. Consequently, the WAF translates linearly changes in the grades. Definition (scale-consistent). The WAF is scale-consistent if sw pφpsi qq  φpsw psi qq i i for all i € I, where φpsi q  a   bsi with a € R, b € R  and si  psi,1 , ..., si,n q. Property 3 (scale-consistent). The WAF is scale-consistent. Scale-consistency is a less demanding property than language-consistency as it is defined in Balinski and Laraki (2010). The latter property refers not only to numerical grades but also to grades based on letters or descriptive phrases. This issue is related with the meaningfulness problem of measure theory in the context of jury decisions (see Krantz et al. (1971) for an early reference). In the present manuscript we focus on the aggregation of well-defined grades on numerical scales with no language-consistency issues among the judges. A related but different property requires that the weight award by a given judge decreases with the distance of that grade from the mean. Property 4 (decreasing weight). Suppose that there exist at least two different grades other than si,j . Then f wi,j {f si,j ¡ 0 if si,j   si and f wi,j {f si,j   0 if si,j ¥ si for all i € I and j € J. Otherwise, wi,j is constant in si,j . The WAF decreases the impact of the most extreme values on the final grade by weighing them less than other grades that are closer to the mean. In our context, since the objective of the WAF is to correct for the possibility of bias this property is natural and the intuition is immediate. The more judge j moves away from the mean the less weight is given to her grade and more weight is given to the other 7 judges grades. This is the first statement of Property 4 and holds when there exist at least two different grades other than si,j .10 The second statement of Property 4 states that weights are constant when there is only one grade (or less) different from si,j .11 In this case we may have two different scenarios: (i) In one scenario, we have n ¡ 1 judges coinciding but judge j proposing something different. Then, it is natural to think that judge j is more likely to be biased than the other n ¡ 1 judges because she is departing from the others. Consequently, the weight given to judge j becomes constant at some minimal value. The judge j is not ignored (because wi,j is strictly positive, see the proof of Property 4) but its importance is reduced to a minimum (judge j is only ignored if α Ñ V, see Property 5 below). This property guarantees representativeness; some information may be less weighted but is not destroyed. In other words, since the weights are bounded from above and below, the WAF neither exclude any grade (i.e., 0   wi,j ) nor is based in a single grade (i.e., wi,j   1), with exception of the trivial case n  1. (ii) In the other scenario, we have n ¡ 2 judges coinciding with judge j but some judge k proposing something different. In this case the judge j is part of the group of judges that is more likely to be correct. Consequently, the grade of judge j receives a larger weight than in the previous scenario (but also constant). However, the fact that wi,j might be constant in some particular cases does not imply that the WAF is constant. This issue is related with the monotonicity property discussed below. In order to discussing monotonicity we consider two asymptotic properties of the WAF. These properties allow us to better understand the relation between the WAF and the value of α. Moreover, they provide a basis to understand why sometimes monotonicity fails. Before we present these properties, we define extreme grade as the grade at the largest distance from the mean. Since extreme grades are not necessarily unique 10 This statement could have been written without the resource to derivatives as follows: ”(...) If si,j ¡ si,k ¡ si , then wi,j ¡ wi,k , while if si,j   si,k ¤ si , then wi,j   wi,k , for all i € I and j, k € J.” The reader may find this alternative but equivalent statement more intuitive. 11 The case in which all judges coincide is trivial; all weights are equal and the WAF delivers the common grade. 8 and equal, we also define the average extreme grade as the arithmetic mean over all extreme grades. Property 5 (asymptotic cases). If α Ñ 0 then sw Ñ si . If α Ñ V then sw i i p°n1 si,l ¡ si,eq{pn ¡ 1q, where si,e denotes the average extreme grade. l Ñ In the limit case α Ñ 0, the WAF converges to the mean grade. The WAF weights all grades equally independently of their distance from the mean (i.e., wi,j Ñ 1{n for all j € J). In the limit case α Ñ V, the WAF ignores the influence of the average extreme grade. Intuitively, if there is only one extreme grade, the WAF ignores this grade the one that is more likely to be biased - and averages with equal weight the other n ¡ 1 grades. In the most general case, we may consider the possibility of several extreme grades. If there are r extreme grades, the WAF weights each of these r extreme grades by pr ¡ 1q{ppn ¡ 1qrq and each of the other n ¡ r non-extreme grades by 1{pn ¡ 1q. The intuition is the same as in the r  1 case, but we have two different situations. For instance, in the profile of grades p6, 6, 7, 9, 9q since si  7.4, we have two equal extreme grades - both in the same side of the mean, i.e., si,4  si,5  9. In the profile of grades p5, 6, 7, 8, 9q since si  7, we have two but unequal extreme grades - one in each side of the mean, i.e., si,1  5 and si,5  9. These observations imply that the WAF cannot be monotonic in a single judge grade for all values of α. In order to see it, consider the following example. Example 1. Consider the profile of grades p5, 6, 6.9q with si  5.97. Suppose that α is very large (say α  V) such that the most extreme grade is removed from sw i according to Property 5. In this case, the grade si,1  5 has the largest distance from the mean and for that reason is removed from sw Ñ p6   6.9q{2  6.45. However, if i si,3 increases from 6.9 to 7.1, then si,3 becomes the most extreme grade and for that reason is removed from sw Ñ p5   6q{2  5.5. In this example, after the increase of i si,3 , the WAF falls abruptly from sw  6.45 to sw  5.5 which shows the failure of i i monotonicity with respect to si,3 . The example was constructed specially to show the failure of monotonicity in a simple way and to motivate the discussion about the meaning and importance of this property in a bias correction mechanism. This is a crucial aspect; in some sense we allow a judge to have full control over her own grade but we only allow partial and restricted influence over the final grade. In this context, we do not exclude the possibility that the bias correction mechanism reacts to extreme grades with a 9 detrimental effect on the final score. In other words, if a judge awards a competitor with a grade that is too large (respectively, small), with respect to the other judges, the final grade may decrease (respectively, increase). Consequently, the existence of a monotonic relation between a particular grade and the final score is not always guaranteed. The following property states under which circumstances the WAF is monotonic. Property 6 (monotonicity). sw is monotonic in si,j if α is sufficiently small. i sw is monotonic in si,j if si,j is sufficiently close to the other judges mean. i The numerical example in Table 1 provides an illustration of Property 6 for α  3. Column (6) of Table 1 shows that while the grade of the first judge raises from si,1  6 to si,1  7 or from si,1  7 to si,1  8 we observe a monotonic relation between si,1 and sw . In this case, si,1 remains sufficiently close to the mean. However, when the i grade of the first judge raises from si,1  8 to si,1  9 the WAF falls from sw  7.314 i to sw  7.303 and monotonicity fails. This movement is due to the fall in the weight i given to the first judge relatively to the increase in the grade (the weight decreases from wi,1  0.204 to wi,1  0.107, see Column (1) of Table 1). The failure of monotonicity occurs because the WAF reacts to extreme grades (the ones that more likely to have been object of bias) through a correction in the opposite direction. In such case an extreme positive (respectively, negative) grade reduces (respectively, rises) the final grade instead of increase (respectively, decrease) it. In our context, since the objective is to propose a bias correction mechanism the failure of monotonicity can be seen as a desirable property. This observation gains even more strength if we consider the circumstances under which it occurs. In particular, monotonicity fails if α is large enough or si,j is sufficiently distant from the mean. Note that the choice of α reflects an explicit intention of punishing deviations from the mean. In our context, this parameter is controlled by the social planner or the competition designer. The larger the value of α, the stronger is the reduction in the weight given to the grades that are distant from the mean. However, if we are more tolerant regarding the grades that are distant from the mean the monotonic relation between sw and si,j is likely to be satisfied. i Note also that a necessary condition for monotonicity to fail is that si,j is an extreme grade. Otherwise, inside the interval bounded by the largest and the lowest grade, the monotonic relation between sw and si,j is satisfied. In some sense, the i WAF has a self-correction mechanism against extreme grades. Column (6) of Table 10 1 shows that monotonicity fails when the first judge raises her grade above the other judges largest grade which in this case is si,5  8. In spite of the failure of monotonicity being convenient, because it occurs under extreme circumstances, there are also negative aspects associated with it. For instance, if a given judge ranks individual A above B then the bias correction mechanism will not necessarily respect this ordering. Consequently, if that judge is removed from the average then individual A can improve over individual B. This effect is known in the literature as the no-show paradox (Felsenthal and Machover, 2008; Fishburn and Brams, 1983; Smith, 1973). profile (6,6,7,7,8) (7,6,7,7,8) (8,6,7,7,8) (9,6,7,7,8) wi,1 (1) 0.204 0.250 0.204 0.107 wi,2 (2) 0.204 0.125 0.094 0.155 wi,3 (3) 0.249 0.250 0.249 0.248 wi,4 (4) 0.249 0.250 0.249 0.248 wi,5 (5) 0.094 0.125 0.204 0.242 si (6) 6.800 7.000 7.200 7.400 sw (7) i 6.686 7.000 7.314 7.303 skew   0 ¡   Table 1: Weights, WAF, Mean, Median and Skewness (case α  3): The effect of variations in the first judge grade when the grades of the other four judges is constant. Now, we consider another property of the WAF that is also related with Property 5. Recall that the proposed aggregation function has the objective of controlling for individual bias. In this context, the correction mechanism gives more weight to the opinion of the judges that are assumed not to be biased. The question is, which are the most likely unbiased judges? Without further information, the answer points to the judges whose grades show higher prevalence and similitude. This idea has motivated the WAF proposed in the present paper. An implication is the following property. Property 7 (homogeneity stickiness). If the distribution of grades has positive (negatively, respectively) skewness then sw   si (sw ¡ si , respectively). i i The WAF compensates positive skewness by lowering sw below si , and does the i opposite otherwise. This property expresses a movement of the final grade towards the most similar and frequent grades of the distribution. The most homogeneous majority has a greater decisiveness in the final grade. This property is important 11 because a large group of individuals with more even grades is more likely to be correct and more difficult to manipulate than a small group. As a rule of thumb positive skewness implies median   si and negative skewness implies si   median. However, this rule sometimes fails (e.g., for multimodal distributions or distributions where one tail is long but the other is heavy). In our case, this rule fails for discrete distributions in which the areas to the left and the right of the median are not equal. This observation highlights situations in which the WAF departure from the median is immediate because in those cases the median does not adjust in the correct direction. In order to better understand this point, consider the numerical example in Table 1. Part 1 highlights the difference between the median and the WAF. Part 2 shows how the homogeneity stickiness Property 7 works and establishes the connection with Property 5. Example 2. (Part 1) Suppose that n  5 and si  p6, 6, 7, 7, 8q. In this case the distribution of grades is positively skewed (Last column in Table 1). The WAF gives more weight to the most homogenous set of grades p6, 6, 7, 7, .q (Columns (1)-(5) of Table 1) and we have sw  6.686   si  6.800 (Columns (6) and (7) of Table 1). i However, the median moves in the opposite direction, i.e., si  6.800   median  7. (Part 2) In the continuation of Part 1, if the first judge awards si,1  8, the distribution of grades becomes negatively skewed. The WAF gives more weight to the largest grades p8, ., 7, 7, 8q because they form the most homogeneous block. However, if the first judge awards si,1  9, the distribution of grades becomes positively skewed again. The WAF gives more weight to the lowest four grades p., 6, 7, 7, 8q. Finally, note that according to Property 5, if α Ñ V then sw is the mean of the most i homogeneous block of four grades. This example is not stylized. With five judges the profiles of grades presented in Example 2 are extremely common in judgement sports as gymnastics, diving, skating, boxing, surfing or dressage, among others. Finally, Properties 1-7 cannot uniquely characterize the proposed aggregation method, mostly because in our setting the weights are not constant but depend in a non-trivial way on the grades that they are weighting, see expressions (1) and (2). Beliakov et al. (2007) and Grabisch et al. (2011a,b) survey this literature. However, in general Properties 1-7 are not easy to satisfy by other aggregation methods because of nonlinearities and cross effects. 12 Neutrality Anonymity Unanimity Monotonicity Independent of irrelevant alternatives Continuity Symmetry Scaling and translation invarianceaq Scale-consistency Decreasing weightaq Homogeneity stickiness Weighted Aggregation Function Yes Yes Yes Yesbq Yes Yes Yes Yes Yes Yes Yes Table 2: General properties of the Weighted Aggregation Function: refer to the weights, bq this property is true under certain conditions. q these properties a 4. An Application The diver Fernando Platas of Mexico lost the 2000 Olympic gold medal in the 3-meter springboard diving competition with an extremely narrow margin to Xiong Ni of China (Column (1) of Table 3). The result generated controversy among fans and press because from the eleven dives counting for the final ranking, three were graded by the Chinese judge Facheng Wang during the semi-final stage.12 Some years later, Emerson et al. (2009) could not statistically reject the hypothesis that Xiong Ni has benefited from nationalistic judging bias. The method used by the International Olympic Committee to compute the final score was the following one. The judging panel was composed by seven judges. Each dive final grade is calculated by summing the middle five awards (the lowest and highest scores were removed) and then multiplying the obtained number by the degree of difficulty DDi and by 3{5, according to the following formula: pointsi  DDi ¢ ¢ 3 5 ¢ ¸7   si,j ¡ min tsi,j u ¡ max tsi,j u , j j j 1 for all i € I. 12 Judges from competitors’ countries were not assigned to the final - but they could be in earlier rounds, as in this case, in the semi-final stage. 13 Our results corroborate Emerson et al. (2009). The application of the WAF (with α  2) to the grades award in the eleven dives returns that Fernando Platas would have won the gold medal with 709.61 points against the Ni Xiong 709.39 points (Column (2) in Table 3). Xiong Ni Fernando Platas Dmitri Sautin Xiao Hailiang Dean Pullar Troy Dumais Mark Ruiz Ken Terauchi Stefan Ahrens Andreas Wels Imre Lengyel Tony Ally Rank 1 2 3 4 5 6 7 8 9 10 11 12 Points (1) 708.72 708.42 703.02 671.04 647.40 642.72 638.22 634.47 619.17 616.53 613.47 583.80 WAF (2) 709.39 709.61 704.29 671.06 647.94 641.62 636.84 633.26 616.94 613.92 613.67 585.00 MJ (3) 713.25 712.20 699.90 674.10 645.30 641.85 631.50 630.60 617.10 616.20 616.20 588.60 (4) = (2)-(1) 0.67 1.19 1.09 0.02 0.54 -1.10 -1.38 -1.22 -2.23 -2.61 0.20 1.20 Table 3: The Olympic Committee versus the Weighted Aggregation Function. Source: Emerson et al. (2009). Note that the International Olympic Committee and the WAF scoring rules are very similar (Columns (1) and (2) of Table 3, respectively). This aspect is important because we want the WAF to correct for potential bias but not to affect indiscriminately the results because bias is only a possibility. In some sense our proposal is a refinement of the International Olympic Committee procedure, and does not dispense transparency and public disclosure policies which are particularly simple and powerful anti-bias monitoring mechanisms. Note that majority judgement (Column (3) of Table 3) preserves the International Olympic Committee ranking. Therefore, failing to capture the nationalistic judging bias in favor of the diver Xiong Ni found by Emerson et al. (2009). Regarding this aspect, majority judgement is a particularly powerful aggregation method against bias and manipulation. The failure in this particular case is due to the asymmetry of the median method; see Part 1 of Example 2 above. Note that when we remove one grade from the left and one grade from the right of the median these two grades can 14 be very asymmetric. In the case of great asymmetry there is a loss of information and in this particular case explains why majority judgement was not able to remove the existent bias. The situation considered in Table 3 is just a particular example; in other situations majority judgement may effectively remove bias. Finally, the comparison of the difference between the WAF and the International Olympic Committee scores (Column (4) of Table 3) suggests the possibility of another different source of bias.13 The divers ranked from the first to the fifth position present a positive difference while the following five divers show a negative difference. This pattern seems to suggest a tendency for the judges to benefit the better positioned divers - reputation bias. If this was the case, both athletes (Fernando Platas and Ni Xiong) have been benefited in this dimension. The joint consideration of these two forms of bias (i.e., nationalistic and reputation bias) may return the first place back to Xiong Ni. The statistical validation of this claim requires information that is not easily available. Nonetheless, further research on these issues is of great relevance to understand the behavioral forces that shape judges decisions. The study of multiple and simultaneous sources of bias seems to be a particularly interesting and unexplored subject. Acknowledgments: I wish to thank to Ricardo Ribeiro, Juan Pablo Rinc´n-Zapatero, the Assoo ciate Editor and two anonymous referees, as well as several seminars and congresses participants for helpful comments and discussions. Financial support from the Spanish Ministerio of Ciencia y Innovaci´n, GRODE and the Barcelona GSE is gratefully acknowledged. The usual caveat applies. o 13 It is interesting to note that the aggregation function may have a dual interpretation. If si (respectively, sw ¡ si ) the mechanism corrects for the possibility that candidate i is i benefited (respectively, penalized) by a single judge bias. However, we can reverse the argument and assume that the minority is correct and look for the existence of a majority bias. In this case, sw   si (respectively, sw ¡ si ) suggests that candidate i might have been penalized (respectively, i i benefited) by a generalized bias (e.g., “reputation bias” - judges are influenced by the athlete’s reputation). This feature of the WAF is particularly interesting. However, the WAF defined in (2) with weights as in (1) does not perform the correction. Such correction could be done by replacing the weight function (3) into (2). In this case, the hypothesis is that there are n ¡ 1 biased judges. Therefore, their grades should lose relevance relatively to the grade of the unbiased judge j € J. Similar reasoning can be applied to any other number of suspected biased judges. sw i   15 Appendix Proof of Property 1. In order to show the first statement just note that if si,j ¡ si  si ¡ si,k by the definition of absolute value function we must have |si,j ¡ si|α  |si,k ¡ si|α . Since all other terms in wi,j and wi,k are similar, we must have wi,j  wi,k . The second statement implies that if for every si,j exist another si,k at the same distance from the mean, say si,j ¡ si  si ¡ si,n 1¡j which can be rewritten as si,j   si,n 1¡j  2si , then by the first statement wi,j  wi,n 1¡j , and we have:  sw i ¸n  j 1 wi,j si,j  ¸n{2  j 1 wi,j psi,j   si,n 1¡j q  ¸n{2  2wi,j si  si , j 1 ° { because n21 wi,j  1{2. In the case that the total number of grades is odd, then there j must exist a odd number of grades equal to the mean in order to exist symmetry. Proof of Property 2. In order to show scaling and translation invariance, apply the affine transformation function φpsi,j q  a   bsi,j , where a € R and b € R  , to each grade si,j in the weight function wi,j , given by expression (1), to obtain: wi,j pφpsi qq for all i € I and j € J.  §α °n °n § 1 §a bsi,l a bsi,k § k 1 l $j n §α °n § °n 1 §a bsi,l a bsi,k § n 1 l 1 k 1 n § °n § ° §a bsi,l a 1 b n si,k §α l $j k 1 n §α ° § ° 1 n 1 n 1 §a bsi,l a n b n1 si,k § l k °n α   ¡   ¡ ¡    ¡ ¡ p ¡ q $j | s ¡ s |  pn ¡ 1lq °ni,l |s i¡ s |α  wi,j psiq, i l1 i,l p ¡ q   p   q p   q ¡ Proof of Property 3. In order to show scale-consistency, apply the affine transformation function φpsi,j q  a   bsi,j , where a € R and b € R  , to each grade si,j in the WAF sw psi q, given in expression (2). Since by Property 2 weights are scaling i and translation invariant it is enough to show that: sw pφpsi qq  i because °n  wi,j j 1 ¸n  j 1 wi,j pa   bsi,j q  a   b ¸n  wi,j si,j k 1  φpsw psiqq, i  1 for all i € I. Proof of Property 4. We start by considering the second statement of Prop16 erty 4, i.e., the case in which there is only one grade different from si,j .14 We have two different scenarios. (i) In the first scenario, n ¡ 1 judges coincide but judge j proposes something different. Let si,k  si,m for all k $ j € J, then, we can write °n α α l$j |si,l ¡ si |  pn ¡ 1q |si,m ¡ si | . After having replaced it in expression (1) we obtain: α qs  pn ¡ 1qp|s pn ¡ 1α | i,m ¡ s1iq| |s ¡ s |αq  pn ¡ 1qα 1 pn ¡ 1q  wmin, (4) pn ¡ i,m i   i,j ¡ si | where in the last equality we made use of the fact that in this case |si,j ¡ si |  pn ¡ 1q|si,m ¡ si| for n ¡ 1. Consequently, the weight given to judge j is constant at some minimal value wmin . (ii) In the second scenario, n ¡ 2 judges coincide with wi,j judge j but some other judge k proposes something different. Consequently, judge j grade receives a larger weight than wmin because she is in the group of judges that is likely to be correct, i.e., we have wi,j  p1 ¡ wi,k q{pn ¡ 1q  p1 ¡ wmin qwmax .15 In both cases wi,j is constant with si,j . Now, we consider the case in which there are at least two grades different from si,j , i.e., the first statement of Property 4. In this case wi,j is given by expression (1). In order to reduce the size of the expression of the derivative f wi,j {f si,j , let zi,j  sgnrsi,j ¡ si s € t¡1, 0, 1u denote the sign function where si,j ¡ si implies zi,j  1, si,j  si implies zi,j  0 while si,j   si implies zi,j  ¡1. Then, differentiate wi,j with respect to si,j to obtain: 1 fwi,j  ¡α n |si,j ¡ si|α fsi,j   °n α¡1 ¡α 1¡ l$j zi,l |si,l ¡ si | 1 n ¨ zi,j |si,j pn ¡ 1q p°n1 |si,l ¡ si|αq2 l ¡ si|α¡1 °n$j |si,l ¡ si|α l °n (5) for all i € I° and j € J, where we made use of the fact that l1 |si,l ¡ si |  |si,j ¡ si|α   n$j |si,l ¡ si|α and that the derivative of °n1 |si,l ¡ si|α with respect l l °n α¡1 to si,j is given by αzi,j p1¡1{nq |si,j ¡ si | ¡αp1{nq l$j zi,l |si,l ¡ si |α¡1 . We want to show that f wi,j {f si,j ¡ 0 if si,j   si (zi,j   0) and f wi,j {f si,j   0 otherwise. Since the denominator is strictly positive the sign of the derivative is given by the numerator. ° We have four cases to consider that depend on whether zi,j and n j zi,l |si,l ¡ si |α¡1 l$ 14 α The case in which all judges coincide is trivial because all weights are equal. We can show that the maximum weight wmax  1{pn ¡ 1q, for n ¡ 2, is given to the judge or judges awarding grades that match the arithmetic mean. If all judges award equal grades they are all equally weighted. If n  2 we always have wi,j  wi,k  1{2 because both judge are equally likely to be correct. 15 17 , ° are negative or positive. If zi,j ¡ 0 and n j zi,l |si,l ¡ si |α¡1 ¡ 0, it is immediate l °$ that f wi,j {f si,j   0, while if zi,j   0 and n j zi,l |si,l ¡ si |α¡1   0, it is immediate l$ ° that f wi,j {f si,j ¡ 0. In the other two cases, if zi,j ¡ 0 and n j zi,l |si,l ¡ si |α¡1   0, l$ in order to show that f wi,j {f si,j   0 we must consider the scenario that makes it more difficult to satisfy, i.e., when zi,l  ¡1 for all l $ j. Similarly, if zi,j   0 and °n α¡1 ¡ 0, in order to show that fwi,j {fsi,j ¡ 0 we must consider the l$j zi,l |si,l ¡ si | scenario that makes it more difficult to satisfy, i.e., when zi,l  1 for all l $ j. In both cases we are left to show that the same inequality is true: pn ¡ 1q ¸n |si,l ¡ si|α ¡ |si,j ¡ si| l $j ¸n α¡1 $ |si,l ¡ si | , l j for all i € I and j € J. Note that for the case si,j  si the inequality holds trivially. Since the WAF is scale-consistent (Property 3) we can normalize si  0 without loss of generality and for simplicity consider the case of two different grades other than si,j , i.e., si,j   si  0   si,k , si,l (the other case si,j ¡ si  0 ¡ si,k , si,l follows the same argument). Then, the above inequality becomes: 2p|si,k |α   |si,l |α q ¡ p|si,k   si,l |qp|si,k |α¡1   |si,l |α¡1 q, where we have used the fact that si  0 implies that ¡si,j  si,k   si,l . Since all quantities are non-negative we can remove the absolute value function from the previous inequality to obtain psi,k α¡1 ¡ si,l α¡1 qpsi,k ¡ si,l q ¡ 0 which is strictly positive for all si,k $ si,l with k $ l € J. Proof of Property 5. We start by considering the limit α Ñ 0. Suppose that there are t  0, 1, ..., n ¡ 1 grades such that si,k  si , denoted with the subindex k € K € J (the case t  n is trivially true that sw  si ), and n ¡ t grades such that i si,j $ si , denoted with the subindex j € J zK. Then, if α Ñ 0 we have |si,j ¡ si |α Ñ 1 ° all for all j € J zK, and |si,k ¡ si |α Ñ 0 for ° k € K. Consequently, n j |si,l ¡ si |α Ñ l$ °n n ¡ 1 ¡ t, l$k |si,l ¡ si |α Ñ n ¡ t and n 1 |si,l ¡ si |α Ñ n ¡ t, which implies that l wi,j Ñ pn ¡ 1 ¡ tq{ppn ¡ tqpn ¡°qq for all j € J zK and wi,k Ñ 1{pn ¡ 1q for all k € K. 1 w Therefore, we can write si Ñ j ‚K pn ¡ 1 ¡ tqsi,j {ppn ¡ tqpn ¡ 1qq  tsi,k {pn ¡ 1q, where ° ° si,k  si  n1 si,j {n, which implies that si,k  si  j ‚K si,j {pn ¡ tq. Replacing j the latter two equalities into sw , after some algebra, we obtain that sw Ñ si . i i Now, consider the limit α Ñ V. In this case we can rewrite wi,j as wi,j  ° 1{ppn ¡ 1qp1   |si,j ¡ si |α { n j |si,l ¡ si |α qq, for all j € J. Suppose that there are l$ r  1, ..., tpn ¡ 1q{2u extreme grades (not necessarily equal but at the same distance from the mean), where txu denotes floor function - the largest integer less than or equal to x (note: we cannot have more than tpn ¡ 1q{2u extreme grades, otherwise, 18 they will not be the extreme grades). There are two different situations to consider. (i) If |si,j ¡ si | is one of the largest grade differences with respect to the mean (i.e., ° si,j is an extreme grade) and α Ñ V, then |si,j ¡ si |α { n j |si,l ¡ si |α Ñ 1{pr ¡ 1q l$ and wi,j Ñ 1{ppn ¡ 1qp1   1{pr ¡ 1qqq  pr ¡ 1q{ppn ¡ 1qrq. Note that in the particular case r  1 we have wi,j Ñ 0. (ii) If |si,j ¡ si | is not one of the largest grade differences with respect to the mean (i.e., si,j is not an extreme ° grade) and α Ñ V, then |si,j ¡ si |α { n j |si,l ¡ si |α Ñ 0 and wi,j Ñ 1{pn ¡ 1q for all l$ j $ k € J. Altogether, we have n ¡ r non-extreme grades, each weighted by 1{pn ¡ 1q and r extreme grades, each weighted by pr ¡ 1q{ppn ¡ 1qrq. Let° r extreme grades be the °n¡r w indexed as j  n ¡ r   1, ..., n, then si Ñ p j 1 si,j  pr ¡ 1q nn¡r 1 si,j {rq{pn ¡ 1q. j ° After adding° and subtracting nn¡r 1 si,j in the numerator we obtain that sw Ñ i j p°n1 si,j ¡ nn¡r 1 si,j {rq{pn ¡ 1q, where °nn¡r 1 si,j {r is the average extreme j j j grade si,e . Proof of Property 6. We start by showing the first statement of Property 6, i.e., sw is monotonic in si,j for small α. In order to do it we proceed as follows. i Since α € r0, Vq we will show that sw is monotonic increasing in si,j for α small, i i.e., in the zero neighborhood. Then, through a numerical example, we show that for sufficiently large α the monotonic relation between sw and si,j is not guaranteed. i Note that the WAF defined in (2) can be written as (see Footnote 8): sw i  ¡¸n |si,l ¡ si|α l 1 ¸n $ k l © ¡ si,k { pn ¡ 1q © ¸n |si,l ¡ si|α  N {D, l 1 for all i € I and j € J, where N and D denote the expressions in the numerator and denominator, respectively. Recall that the derivative of sw with respect to si,j i is given by f sw {f si,j  pN I D ¡ N DI q{D2 . In the case that si,j $ si for all j € J i (the case that si,k  si for some k € J follows the same argument) the derivative of the expression in the denominator with respect to si,j is equal to the numerator of expression (5) multiplied by n ¡ 1, that is: DI  pn ¡ 1qαzi,j p1 ¡ 1{nq |si,j ¡ si|α¡1 ¡ pn ¡ 1qαp1{nq ¸n α¡1 $ zi,l |si,l ¡ si | , l j where we use the same notation as in the Proof of ° Property 4.° While the expression in ° the numerator can be written as N  |si,j ¡ si |α n$j si,k   n j |si,l ¡ si |α n$l si,k . k l$ k 19 The derivative of this expression with respect to si,j is given by: NI  ¸ ¡ si|α¡1 n$j si,k k ¸n ¸n ¸n α¡1 |si,l ¡ si|α . ¡αp1{nq l$j zi,l |si,l ¡ si| si,k   k $l l $j αzi,j p1 ¡ 1{nq |si,j ° In the limit α Ñ 0, we have N Ñ pn ¡ 1q n 1 si,l , D Ñ pn ¡ 1qn, N I Ñ n ¡ 1 l and DI Ñ 0. Therefore, f sw {f si,j Ñ 1{n for α Ñ 0 which is strictly positive for any i profile of grades. For large values of α we show through a numerical example the failure of monotonicity. Column (6) of Table 1 shows that for α  3 when the grade of the first judge raises from si,1  8 to si,1  9 the WAF falls from sw  7.314 to i sw  7.303, showing that monotonicity is not guaranteed for large α. i Now, we consider the second statement of Property 6. The proof of monotonicity for si,j around the mean follows a similar strategy. We evaluate ° w {f si,j in the fsni neighborhood of si , i.e., for si,j Ñ si which is equivalent to si,j Ñ l$j si,l {pn ¡ 1q (the other judges mean). In this case we have |si,j ¡ si |α Ñ 0, and consequently, ° ° ° N Ñ n j |si,l ¡ si |α n$l si,k , D Ñ pn ¡ 1q n j |si,l ¡ si |α , l$ k l$ NI Ñ ¡αp1{nq ¸n $ l j zi,l |si,l ¡ si |α¡1 ¸n $ k l si,k   ¸n α $ |si,l ¡ si | , l j ° and DI Ñ ¡pn ¡ 1qαp1{nq n j zi,l |si,l ¡ si |α¡1 . At this stage the expression for l$ fsw {fsi,j is particularly large. Since the WAF is scale-consistent (Property 3) we can i normalize si  0 without loss of generality, which implies that si,j Ñ 0 and we can ° write n$l si,k  ¡si,l . Then, the expression for f sw {f si,j converges to: i k 1 1 p¡α n °n$j zi,l |si,l |α¡1 p¡si,l q   °n$j |si,l |αq °n$j |si,l |α   α n °n$j |si,l |α p¡si,l q °n$j zi,l |si,l |α¡1 l l l l l °n , α 2 pn ¡ 1qp l$j |si,l | q (6) for all i € I and j € J. Note that si,l ¡ 0 is equivalent to zi,l ¡ 0 (positive and above the mean) and implies that zi,l |si,l |α¡1 p¡si,l q  ¡ |si,l |α , |si,l |α p¡si,l q  ¡ |si,l |α 1 and zi,l |si,l |α¡1  |si,l |α¡1 , while si,l   0 is equivalent to zi,l   0 (negative and below the mean) which implies that zi,l |si,l |α¡1 p¡si,l q  ° |si,l |α , |si,l |α p¡si,l q  ¡n α 1 α¡1 α¡1 |°i,l | and°i,l |si,l |  ¡ |si,l | ° . Therefore, we have l$j zi,l |si,l°¡1 p¡si,l q  s |α z n n n α α α 1 °n   l$j”¡ |si,l |α 1 and n$j zi,l |si,l |α¡1 l$j |si,l | , l$j |si,l | p¡si,l q  ¡ l$j ”  |si,l | l °n °n |si,l |α¡1 ¡ l$j”¡ |si,l |α¡1 , where the indices of summation l $ j ”   and l$j ”  l $ j ” ¡ denote the summation over the positive and the negative terms different from j, respectively. Since the denominator of expression (6) is strictly positive the 20  numerator determines the sign of f sw {f si,j . After having replaced these equalities, i the numerator of expression (6) becomes: 1 pα n ¸n |si,l |α   l $j ¸n |si,l |αq l $j ¸n α $ |si,l | ¸ ¸ ¸ 1 ¸  α n p¡ n$j”  |si,l |α 1   n$j”¡ |si,l |α 1qp n$j”  |si,l |α¡1 ¡ n$j”¡ |si,l |α¡1q, l l l l l j for all i € I and j € J. The component in the first line is strictly positive. Therefore, the only way for this expression to be negative is when the term in the second line is sufficiently negative. We show that such is impossible. Consider the worst case scenario in° which there is only one large positive term and many small negative terms ° such that n j ”  |si,l |α 1  |si,l |α 1 ¡ 0 and n j ”¡ |si,l |α 1  0, respectively. The l$ l$ objective of this assumption is to obtain the largest negative term in the second line. In this case we obtain that: 1 1 fsw Ñ pα n |si,l |α   |si,l |αq |si,l |α ¡ α n |si,l |α 1 |si,l |α¡1 i fsi,j pn ¡ 1qp|si,l |αq2 1 1 pα n |si,l |2α   |si,l |2αq ¡ α n |si,l |2α  1 ,  pn ¡ 1qp|s |αq2 pn ¡ 1q i,l for all i € I and j € J, which is strictly positive for any profile of grades. The symmetric worst case scenario with a single negative term and many small positive terms follows the same argument. For values of si,j distant from the other judges mean we show through a numerical example the failure of monotonicity. Column (6) of Table 1 shows that while the grade of the first judge moves from si,1  8 to si,1  9, ° i.e., it gets more distant from the other judges mean n 2 si,l {4  p6   7   7   8q{4  7, l the WAF falls from sw  7.314 to sw  7.303, showing that monotonicity is not i i guaranteed for si,1 sufficiently away from the other judges mean. Finally, following Property 4, if there exist only one grade different than si,j then we can construct a situation in which occurs a downward discontinuity in the weight of judge j after an increase in her grade by an infinitesimal amount. In order to complete the proof we must verify this case. In order to construct such case assume that initially all judges (including judge j) are awarding the mean grade si,j  si  si,m . Consequently, all grades are equally weighted by 1{n and sw  si  si,m . Now, i suppose that judge j increase her grade from si,m to si,j ¡ si,m . In this case she departs from the other n ¡ 1 judges and the weight wi,j given to her grade falls discontinuously from 1{n to wmin  1{ppn ¡ 1q   pn ¡ 1qα q (see scenario (i) in the proof of Property 4) while the weight of the other n ¡ 1 judges jumps from 1{n to 21 p1 ¡ wminq{pn ¡ 1q. Therefore, in the discontinuous case, the difference between the new WAF sw and the initial WAF sw  si,m is given by: i i sw i ¢ ¡ si,m  pn ¡ 1q   pn ¡ 1qα si,j   i,j ¡ i,m  pn ¡s1q   psn ¡ 1qα , 1 which is strictly positive for all si,j from symmetry (Property 1). 1¡ 1 pn ¡ 1q   pn ¡ 1qα  si,m ¡ si,m ¡ si,m. The proof of the case si,j   si follows Proof of Property 7. If the distribution of the grades has positive (negatively, ° respectively) α   1-th absolute central moment, defined as n1 |si,k ¡ si |α psi,j ¡ si q{n k (where if α ¡ 0 is even and integer we have the usual α   1-th central moment), then sw   si (sw ¡ si , respectively). In order to show the relation between the α   1i i th absolute central moment, the skewness and the WAF manipulate the inequality sw   si to obtain: i °n $ |s ¡ s | si ¡  j1 pn ¡ 1q °ni,l |s i¡ s |α si,j i l1 i,l °n α α α ¸n l$j |si,l ¡ si |   |si,j ¡ si | ¡ |si,j ¡ si | °n ô 0 ¡ j 1 psi,j ¡ siq ¡ pn ¡ 1q l1 |si,l¸ si|α ¸ ¸n sw i ô α l j |si,j ¡ si|α psi,j ¡ siq{ppn ¡ 1q j 1 n α  |si,l ¡ si | q ¡ 0, n l 1 for all i € I. The sign of the expression in the numerator (i.e., the α   1-th absolute central moment) determines the skewness, which is positive if sw   si . The case i si   sw follows the same argument. i References Arrow, K. J., 1950. A difficulty in the concept of social welfare. The Journal of Political Economy 58 (4), 328–346. Ashenfelter, O., Quandt, R., 1999. Analyzing a wine tasting statistically. Chance 12 (3), 16–20. Balinski, M., Laraki, R., 2007. A theory of measuring, electing, and ranking. Proceedings of the National Academy of Sciences 104 (21), 8720–8725. Balinski, M., Laraki, R., 2011. Election by majority judgment: experimental evidence. In: In situ and laboratory experiments on electoral law reform. Springer, pp. 13–54. 22 Balinski, M., Laraki, R., 2014. Judge: Don’t vote! Operations Research 62 (3), 483–511. Balinski, M. L., Laraki, R., 2010. Majority judgment: measuring, ranking, and electing. MIT press. Beliakov, G., Pradera, A., Calvo, T., 2007. Aggregation functions: a guide for practitioners. Vol. 221. Springer. Borda, J., 1784. Histoire de l’acad´mie royale des sciences, 657–665. e Brams, S. J., Fishburn, P. C., 1978. Approval voting. American Political Science Review 72 (03), 831–847. Buchanan, J. T., Henig, E. J., Henig, M. I., 1998. Objectivity and subjectivity in the decision making process. Annals of Operations Research 80 (0), 333–345. Chebotarev, P. Y., Shamis, E., 1998. Characterizations of scoring methods for preference aggregation. Annals of Operations Research 80, 299–332. Condorcet, J., 1785. Essai sur l’application de l’analyse ´ la probabilit´ des d´cisions rendues ´ la a e e a pluralit´ des voix. l’Imprimerie Royale, Paris. e Dawes, R. M., Faust, D., Meehl, P. E., 1989. Clinical versus actuarial judgment. Science 243 (4899), 1668–1674. Emerson, J. W., Seltzer, M., Lin, D., 2009. Assessing judging bias: An example from the 2000 olympic games. The American Statistician 63 (2), 124–131. Felsenthal, D. S., Machover, M., 2008. The majority judgement voting procedure: a critical evaluation. Homo Oeconomicus 25 (3/4), 319–334. Findlay, L. C., Ste-Marie, D. M., et al., 2004. A reputation bias in figure skating judging. Journal of Sport and Exercise Psychology 26 (1), 154–166. Fishburn, P. C., Brams, S. J., 1983. Paradoxes of preferential voting. Mathematics Magazine 56 (4), 207–214. Fritz, C., Curtin, J., Poitevineau, J., Morrel-Samuels, P., Tao, F.-C., 2012. Player preferences among new and old violins. Proceedings of the National Academy of Sciences 109 (3), 760–763. Gibbard, A., 1973. Manipulation of voting schemes: a general result. Econometrica 41 (4), 587–601. Ginsburgh, V. A., Van Ours, J. C., 2003. Expert opinion and compensation: Evidence from a musical competition. The American Economic Review 93 (1), 289–296. Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E., 2011a. Aggregation functions: construction methods, conjunctive, disjunctive and mixed classes. Information Sciences 181 (1), 23–43. Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E., 2011b. Aggregation functions: means. Information Sciences 181 (1), 1–22. 23 Hodgson, R. T., 2008. An examination of judge reliability at a major us wine competition. Journal of Wine Economics 3 (2), 105–113. Kahneman, D., Tversky, A., 1996. On the reality of cognitive illusions. Kingstrom, P. O., Mainstone, L. E., 1985. An investigation of the rater-ratee acquaintance and rater bias. Academy of Management Journal 28 (3), 641–653. Krantz, D. H., Luce, R. D., Suppes, P., Tversky, A., 1971. Foundations of Measurement (Additive and Polynomial Representations), vol. 1. Academic Press, New York. Laplace, P., 1820. Œuvres compl’etes de laplace tome 7 (3rd Ed), v and clii–cliii. Lee, J., 2008. Outlier aversion in subjective evaluation. Journal of Sports Economics 9 (2), 141–159. Meehl, P. E., 1954. Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. University of Minnesota Press. Miller, N., Resnick, P., Zeckhauser, R., 2005. Eliciting informative feedback: The peer-prediction method. Management Science 51 (9), 1359–1373. Prelec, D., 2004. A bayesian truth serum for subjective data. Science 306 (5695), 462–466. Satterthwaite, M. A., 1975. Strategy-proofness and arrow’s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of Economic Theory 10 (2), 187–217. Schlag, K. H., Tremewan, J., Van der Weele, J. J., 2015. A penny for your thoughts: A survey of methods for eliciting beliefs. Experimental Economics 18 (3), 457–490. Smith, J. H., 1973. Aggregation of preferences with variable electorate. Econometrica 41 (6), 1027– 1041. Tversky, A., Kahneman, D., 1974. Judgment under uncertainty: Heuristics and biases. Science 185 (4157), 1124–1131. Zitzewitz, E., 2006. Nationalism in winter sports judging and its lessons for organizational decision making. Journal of Economics & Management Strategy 15 (1), 67–99. 24