The Effect of Sampling Design and Response Mechanism on Multivariate Regression-Based Predictorsстатья из журнала
Аннотация: Abstract Abstract A general regression procedure for the prediction of a vector of population means in situations of nonresponse is proposed. The multivariate treatment of the prediction problem is not computationally complicated and allows the borrowing of information from one target variable to the other even when both variables are subject to nonresponse. The predictors are optimal under a general model that specifies the first and second moments of the joint distribution of the survey variables. The effects of the sampling design and response mechanism on the properties of the predictors are investigated, and appropriate modifications and bias corrections that use the sample inclusion probabilities are proposed. The performance of the predictors is illustrated empirically and compared with that of other predictors using simulated and real data. Data collected in governmental and other large-scale surveys is almost always incomplete. Typical causes for the incompleteness of the data are delays in attaining parts of the information, refusals to answer certain questions, and exclusion of erroneous data. Another feature characterizing many surveys is that the sampling design and, in particular, the sample inclusion probabilities are determined by the values of design variables that are correlated with the survey target variables. The survey of industrial establishments in Israel is an example for the use of such a complex sampling scheme. Typical response patterns observed for this survey are presented in Table 1. Data collected in the survey were used for the empirical study described in Section 5. In this article I present a multivariate regression procedure that can be used to predict simultaneously the finite population means of p related survey variables. A notable feature of this procedure is that the imputations of the missing data for any given unit use all of the information known for that unit, including observations on variables that themselves are missing for other units. The implementation of the procedure requires the estimation of the variance-covariance matrix of the survey variables, and this can be done efficiently and in a relatively simple way by use of the EM algorithm described by Beale and Little (1975). The multivariate procedure can be modified to deal with situations where the sample inclusion probabilities and the probabilities of nonresponse depend on the measured values of design and covariate variables. The result of this dependence is that the distribution of the sample observations of the survey variables is different from the distribution in the population, which causes a bias in the unmodified predictors. The modification consists of weighting the observations in the EM algorithm by the inverse of the units' inclusion probabilities and subtracting an estimate of the prediction bias, obtained by traditional sampling theory, from the original predictors. Key Words: ImputationInformative samplesMissing at randomNoninformative samplesNormal EM algorithmPrediction mean squared error Pξ distributionWeighted EM algorithm
Год издания: 1988
Авторы: Danny Pfeffermann
Источник: Journal of the American Statistical Association
Ключевые слова: Statistical Methods and Bayesian Inference, Survey Sampling and Estimation Techniques, Water Quality and Resources Studies
Открытый доступ: closed
Том: 83
Выпуск: 403
Страницы: 824–833