Probit Regression with Correlated Label Noise: An EM-EP approach


Probit regression and logistic regression are well-known models for classification. In contrast to logistic regression, probit regression has a canonical generalization that allows us to model correlations between the labels. This is a way to include metadata into the model that correlate the noisy observation process. We show that the approach leads to the mathematical problem of integrating a high-dimensional Gaussian density over the positive orthant. We derive a novel parameter estimation algorithm for this correlated probit regression model. We interpret the noise as a latent variable, which leads to a natural formulation of our algorithm as an expectation-maximization (EM) scheme. Each partial M-step is a gradient step, and we can express the gradient in terms of moments of the truncated multivariate Gaussian. Calculating these moments - the E-step - is expensive using traditional methods. Instead, we use a recent application of expectation propagation (EP) to Gaussian densities. The resulting EM-EP scheme is much faster and thus allows us to treat large data sets.

NIPS 2014 Workshop on Advances in Variational Inference

More Publications