The paper empirically evaluates the potential bias in estimates of the average union-non-union wage differential for Canada based on a regression of aggregate wages on unionization. The analysis indicates that these estimates can be imprecise and biased compared with the estimates derived from actual union and non-union wage data, when differentials are constrained to be constant across industries. Sur la mesure de l'impact du syndicat sur le salaire relatif: une note methodologique. Ce memoire determine empiriquement le biais potentiel qui existe dans l'6valuation du diff6rentiel moyen de salaire au Canada entre les syndiquds et les non-syndiques a partir d'6quations de regression des niveaux de salaire moyen sur le degr6 de syndicalisation. L'analyse montre que ces evaluations peuvent etre imprecises et biais6es par rapport aux evaluations d6rivees des donn6es reelles sur les salaires des syndiques et des non-syndiquds quand on presume que les differentiels sont constants d'une industrie a l'autre. Recent estimates of union-non-union wage differentials by Glenn MacDonald (1981, 1983) and Maki and Christenson (1980) indicate that union workers in Canada receive on an average 16 to 51 per cent higher wages compared with non-union workers. The purpose of this note is to point out that these estimates may be significantly biased. The cause of the bias is the use of average wage data rather than actual union and non-union wages for estimating the differential. Following Mulvey and Abowd (1980) we first discuss two methods of estimating the union-non-union differential: (1) the so-called incomplete or conventional method, which makes use of readily available data on average wages from a sample of industries or occupations and then derives estimates of the union wage impact by regressing wages on proportions of unionization and other worker and job characteristics; (2) the complete method, where the union-non-union wage differential can be calculated directly from actual observations of union and non-union wages. Next we empirically estimate the differential by these two methods for a sample of industries with alternative functional specifications using both OLS and 2SLS estimation techniques. The various estimates are then compared and evaluated. The authors wish to thank Gordon Fisher, James MacKinnon, the associate editor, and two anonymous referees for their helpful comments and suggestions. Canadian Journal of Economics Revue canadienne d'Economique, xvIII, No. 1 February fevrier 1985. Printed in Canada Imprime au Canada 0008-4085 / 85 / 182-89 $1.50 ? Canadian Economics Association This content downloaded from 157.55.39.108 on Fri, 17 Jun 2016 05:13:25 UTC All use subject to http://about.jstor.org/terms Union relative wage impact 183 COMPLETE AND INCOMPLETE METHOD Let Wui be the average union wage in industry i, Wni the average wage rate of non-union workers, Ui the proportion of workers who are unionized, and Wi the average wage rate in industry i. If both Wui and Wni are observable, the relative wage impact of unions can be estimated directly, since the average union-non-union differential, di, is, by definition, the conditional difference between E(ln Wuil Ui) and E(ln WnilUi), where E(ylx) is the conditional expectation of a random variable y given x and In is the natural logarithm. This is the complete data method. When only Wi and Ui are observed, estimation of the differential requires the use of an incomplete method with Wi taken as a geometric weighted mean of Wui and Wni and using Ui as weights. With wages expressed in logarithmic form, the regression function of In Wi can be written as E(ln WilUi) = Ui[E(ln WuilU,)] +1 Ui[E(ln WnilUi)]. (1) Using the fact that di is the conditional differential between In Wui and In Wni, equation (1) can be rewritten as E(ln Wil Ui) = E(ln Wni Ui) + diUi. (2) Equation (2) provides the framework for estimating the union-non-union differential, with the so-called incomplete method. The equation, however, cannot be empirically estimated without making assumptions about the functional form of E(ln Wnil Ui), which is unobservable, and the parameter di, which as it stands varies from industry to industry. Usually E(ln WnilUi) is assumed to be dependent upon a set of exogenous characteristics Xi, such as age, education, and sex of workers. Thus f(Xi) replaces E(ln WnilUi) in equation (2). Biases resulting from this type of assumption, although important, are not the subject of this paper. We are instead interested in the possible biases arising from assumptions about di. The simplest approach to eliminate di is to assume that di = d. Or di can be assumed to be linearly dependent on Ui, the extent of unionism by industry. A further modification may allow for higher order dependence on Ui by including a non-linear term in Ui. Alternatively, as a more general specification, di can be assumed to vary with a vector of industry characteristics such as the degree of unionization, the index of skill mix or the establishment size. In the simple functional specification of di where di = d, the regression equation (2) using the incomplete method is written as E(ln WilUi) = E(ln WnI Ui) + dUi + (di d)Ui, (3) where d is the average union-non-union differential to be estimated. In the case of complete method, when both union and non-union wage rates are observable, the regression function is written simply as E[(ln WuilUi) (In WnilUi)] = d. (4) In the more general case where di is a linear function of a vector of industry characteristics Zi (in deviation from the mean form) such that di = d + Zi,, with Zi This content downloaded from 157.55.39.108 on Fri, 17 Jun 2016 05:13:25 UTC All use subject to http://about.jstor.org/terms 184 Pradeep Kumar and Thanasis Sterigos and 0 being conforming vectors, the functional specification of regression equation (3) becomes E(ln WilUi) = E(ln WnilUi) + dUi + ZiOUi + (di d Zi)Ui. (3a) The regression equation (4) with the complete data method in this case will be of the form E[(ln WuilUi) (ln WnilUi)] = d + ZiO. (4a) The purpose of our analysis is to evaluate sources of bias in the estimate of the average differential, d, from regression equations such as (3) or (3a) compared with estimates derived from equations (4) and (4a). The potential biases arising from the use of incomplete method involved in estimating equations (3) and (3a) are seen to emanate from (a) the omission of the last term (di d)Ui or (di d Zi,)Ui; (b) the incorrect specification of E(ln WnilUi), when observations on In Wni are not available and proxy variables are used; and finally, (c) the measurement errors if In Wni is used for E(ln Wnil Ui). In the following section we report the empirical estimates of equations (3) and (3a), and (4) and (4a) for Canada with both OLS and 2sLS techniques. We then test the hypothesis that the estimated differential from the complete and incomplete methods is 'close' enough statistically.