Methods Inf Med 2012; 51(02): 168-177
DOI: 10.3414/ME11-02-0021
Focus Theme – Original Articles
Schattauer GmbH

Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting[*]

A. Groll
1   Department of Statistics, University of Munich, Munich, Germany
,
G. Tutz
1   Department of Statistics, University of Munich, Munich, Germany
› Institutsangaben
Weitere Informationen

Publikationsverlauf

received:04. Juli 2011

accepted:20. März 2011

Publikationsdatum:
19. Januar 2018 (online)

Summary

Objective: With the emergence of semi- and nonparametric regression the generalized linear mixed model has been extended to account for additive predictors. However, available fitting methods fail in high dimensional settings where many explanatory variables are present. We extend the concept of boosting to generalized additive mixed models and present an appropriate algorithm that uses two different approaches for the fitting procedure of the variance components of the random effects.

Methods: The main tool developed is likelihood-based componentwise boosting that enforces variable selection in generalized additive mixed models. In contrast to common procedures they can be used in high-dimensional settings where many covariates are available and the form of the influence is unknown. The complexity of the resulting estimators is determined by information criteria. The performance of the methods is investigated in simulation studies for binary and Poisson responses with comparisons to alternative approaches and it is applied to clinical real world data.

Results: Simulations show that the proposed methods are considerably more stable and more accurate in estimating the regression function than the conventional approach, especially when a large number of predictors is available. The methods also produce reasonable results in applications to real data sets, which is illustrated by the Multicenter AIDS Cohort Study.

Conclusions: The boosting algorithm allows to extract relevant predictors in generalized additive mixed models. It works in high-dimensional settings and is very stable.

* Supplementary material published on our website www.methods-online.com.


 
  • References

  • 1 Freund Y, Schapire RE. Experiments with a New Boosting Algorithm In Proceedings of the Thirteenth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann; 1996. pp 148-156.
  • 2 Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting. Annals of Statistics 2000; 28: 337-407.
  • 3 Bühlmann P, Yu B. Boosting with the L2 loss: Regression and classification. Journal of the American Statistical Association 2003; 98: 324-339.
  • 4 Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics 2001; 29: 337-407.
  • 5 Stollhoff R, Sauerbrei W, Schumacher M. An Experimental Evaluation of Boosting Methods for Classification. Methods Inf Med 2010; 49: 219-229.
  • 6 Bühlmann P, Hothorn T. Boosting algorithms: Regularization, prediction and model fitting. Statistical Science 2007; 22: 477-505.
  • 7 Kaslow RA, Ostrow DG, Detels R, Phair JP, Polk BF, Rinaldo CR. The multicenter AIDS cohort study: rationale, organization and selected characteristic of the participants. American Journal of Epidemiology 1987; 126: 310-318.
  • 8 Zeger SL, Diggle PJ. Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 1994; 50: 689-699.
  • 9 Lin X, Zhang D. Inference in Generalized Additive Mixed Models by Using Smoothing Splines. Journal of the Royal Statistical Society 1999; B61 381-400.
  • 10 Zhang D, Lin X, Raz J, Sowers M. Semi-parametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 1998; 93: 710-719.
  • 11 Marx DB, Eilers PHC. Direct Generalized Additive Modelling with Penalized Likelihood. Comp Stat & Data Analysis 1998; 28: 193-209.
  • 12 Wood SN. Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association 2004; 99: 673-686.
  • 13 Wood SN. Generalized Additive Models: An Introduction with R. London: Chapman & Hall; 2006.
  • 14 Wand MP. A Comparison of Regression Spline Smoothing Procedures. Computational Statistics 2000; 15: 443-462.
  • 15 Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge: Cambridge University Press; 2003.
  • 16 Eilers PHC, Marx BD. Flexible Smoothing with B-Splines and Penalties. Statistical Science 1996; 11: 89-121.
  • 17 Fahrmeir L, Tutz G. Multivariate Statistical Modelling Based on Generalized Linear Models. 2nd ed.. New York: Springer-Verlag; 2001.
  • 18 Breslow NE, Clayton DG. Approximate Inference in Generalized Linear Mixed Model. Journal of the American Statistical Association 1993; 88: 9-25.
  • 19 Lin X, Breslow NE. Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion. Journal of the American Statistical Association 1996; 91: 1007-1016.
  • 20 Breslow NE, Lin X. Bias Correction in Generalized Linear Mixed Models with a Single Component of Dispersion. Biometrika 1995; 82: 81-91.
  • 21 Wolfinger R, O'Connell M. Generalized Linear Mixed Models; A Pseudolikelihood Approach. Journal of Statistical Computation and Simulation 1993; 48: 233-243.
  • 22 Littell R, Milliken G, Stroup W, Wolfinger R. SAS System for Mixed Models. Cary, NC: SAS Institute Inc; 1996.
  • 23 Vonesh EF. A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika 1996; 83: 447-452.
  • 24 Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. 2nd ed.. New York: Springer; 2009.
  • 25 Bühlmann P. Boosting for high-dimensional linear models. Annals of Statistics 2006; 34: 559-583.
  • 26 Tutz G, Binder H. Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 2006; 62: 961-971.
  • 27 Tutz G, Groll A. Binary and Ordinal Random Effects Models Including Variable Selection. Journal of Computational and Graphical Statistics 2011 submitted.
  • 28 Tutz G, Reithinger F. A boosting approach to flexible semiparametric mixed models. Statistics in Medicine 2007; 26: 2872-2900.
  • 29 Parmigiani EG, Garrett ES, Irizarry RA, Zeger SL. The Analysis of Gene Expression Data: Methods and Software. New-York: Springer-Verlag; 2003.
  • 30 Dudoit S, Gentleman RC, Quackenbush J. Open source software for the analysis of microarray data. Biotechniques 2003; 34: 45-51.