The Effect of Sampling Design and Response Mechanism on Multivariate Regression-Based Predictors

Danny Pfeffermann

Страница публикации Публикация в OpenAlex

Аннотация: Abstract Abstract A general regression procedure for the prediction of a vector of population means in situations of nonresponse is proposed. The multivariate treatment of the prediction problem is not computationally complicated and allows the borrowing of information from one target variable to the other even when both variables are subject to nonresponse. The predictors are optimal under a general model that specifies the first and second moments of the joint distribution of the survey variables. The effects of the sampling design and response mechanism on the properties of the predictors are investigated, and appropriate modifications and bias corrections that use the sample inclusion probabilities are proposed. The performance of the predictors is illustrated empirically and compared with that of other predictors using simulated and real data. Data collected in governmental and other large-scale surveys is almost always incomplete. Typical causes for the incompleteness of the data are delays in attaining parts of the information, refusals to answer certain questions, and exclusion of erroneous data. Another feature characterizing many surveys is that the sampling design and, in particular, the sample inclusion probabilities are determined by the values of design variables that are correlated with the survey target variables. The survey of industrial establishments in Israel is an example for the use of such a complex sampling scheme. Typical response patterns observed for this survey are presented in Table 1. Data collected in the survey were used for the empirical study described in Section 5. In this article I present a multivariate regression procedure that can be used to predict simultaneously the finite population means of p related survey variables. A notable feature of this procedure is that the imputations of the missing data for any given unit use all of the information known for that unit, including observations on variables that themselves are missing for other units. The implementation of the procedure requires the estimation of the variance-covariance matrix of the survey variables, and this can be done efficiently and in a relatively simple way by use of the EM algorithm described by Beale and Little (1975). The multivariate procedure can be modified to deal with situations where the sample inclusion probabilities and the probabilities of nonresponse depend on the measured values of design and covariate variables. The result of this dependence is that the distribution of the sample observations of the survey variables is different from the distribution in the population, which causes a bias in the unmodified predictors. The modification consists of weighting the observations in the EM algorithm by the inverse of the units' inclusion probabilities and subtracting an estimate of the prediction bias, obtained by traditional sampling theory, from the original predictors. Key Words: ImputationInformative samplesMissing at randomNoninformative samplesNormal EM algorithmPrediction mean squared error Pξ distributionWeighted EM algorithm

Год издания: 1988

Авторы: Danny Pfeffermann

Источник: Journal of the American Statistical Association

Ключевые слова: Statistical Methods and Bayesian Inference, Survey Sampling and Estimation Techniques, Water Quality and Resources Studies

Показать дополнительные сведения

Библиотечно-издательский комплекс СФУ

The Effect of Sampling Design and Response Mechanism on Multivariate Regression-Based Predictors
статья из журнала

Будние дни	9:00–19:00
Суббота	9:00–17:00
Воскресенье	выходной день

Единый телефон	+7 (391) 291-25-74
Библиотека	+7 (391) 206-21-06
Издательство	+7 (391) 206-25-88
E-mail	bik [at] sfu-kras.ru
Адрес	пр. Свободный, 79/10

The Effect of Sampling Design and Response Mechanism on Multivariate Regression-Based Predictorsстатья из журнала

The Effect of Sampling Design and Response Mechanism on Multivariate Regression-Based Predictors
статья из журнала