Methods Inf Med 2014; 53(06): 428-435
DOI: 10.3414/ME13-01-0123
Original Articles
Schattauer GmbH

Extending Statistical Boosting

An Overview of Recent Methodological Developments
A. Mayr
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
H. Binder
2   Institut für Medizinische Biometrie, Epidemiologie und Informatik, Johannes Gutenberg-Universität Mainz, Germany
,
O. Gefeller
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
M. Schmid
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
3   Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
› Author Affiliations
Further Information

Publication History

received: 11 November 2013

accepted: 02 May 2014

Publication Date:
20 January 2018 (online)

Summary

Background: Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade.

Objectives: This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research.

Methods: We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now.

Results: The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings.

Conclusions: Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.

 
  • References

  • 1 Schapire RE. The Strength of Weak Learnability. Machine Learning 1990; 5 (02) 197-227.
  • 2 Freund Y. Boosting a Weak Learning Algorithm by Majority. In: Fulk MA, Case J. editors Proceedings of the Third Annual Workshop on Computa-tional Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8. 1990; 1990: 202-216.
  • 3 Freund Y, Schapire R. Experiments With a New Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: San Francisco: Morgan Kaufmann Publishers Inc.; 1996: 148-156.
  • 4 Mayr A, Binder H, Gefeller O, Schmid M. The Evolution of Boosting Algorithms - From Machine Learning to Statistical Modelling. Methods Inf Med 2014; 53: 419-427.
  • 5 Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990
  • 6 Eilers PHC, Marx BD. Flexible Smoothing with B-splines and Penalties (with discussion). Statistical Science 1996; 11: 89-121.
  • 7 Kruppa J, Liu Y, Biau G, Kohler M, König IR, Malley JD. et al. Probability Estimation with Machine Learning Methods for Dichotomous and Multi-Category Outcome: Theory. Biometrical Journal 2014. Available from: http://dx.doi.org/10.1002/bimj.201300068
  • 8 Boulesteix AL, Schmid M. Discussion: Machine Learning Versus Statistical Modeling. Biometrical Journal. 2014. Available from: http://dx.doi.org/10.1002/bimj.201300226
  • 9 Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001; 29: 1189-1232.
  • 10 Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science 2007; 22: 477-522.
  • 11 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.
  • 12 Tutz G, Binder H. Boosting Ridge Regression. Computational Statistics & Data Analysis 2007; 51 (12) 6044-6059.
  • 13 Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics 2009; 65 (02) 626-634.
  • 14 Fan J, Lv J. A Selective Overview of Variable Selection in High Dimensional Feature Space. Statistica Sinica 2010; 20: 101-148.
  • 15 Bühlmann P, Yu B. Sparse Boosting. Journal of Machine Learning Research 2007; 7: 1001-1024.
  • 16 Bühlmann P. Bagging, Boosting and Ensemble Methods. In: Gentle YJE, Härdle W, Mori Y. editors Handbook of Computational Statistics Springer Handbooks: 2012: 985-1022.
  • 17 Hansen MH, Yu B. Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association 2001; 96 (454) 746-774.
  • 18 Hurvich CM, Tsai CL. Regression and Time Series Model Selection in Small Samples. Biometrika 1989; 76 (02) 237-397.
  • 19 Greven S, Kneib T. On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Biometrika 2010; 97 (04) 773-789.
  • 20 Hastie T. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22 (04) 513-515.
  • 21 Mayr A, Hofner B, Schmid M. The Importance of Knowing When to Stop - A Sequential Stopping Rule for Component-Wise Gradient Boosting. Methods Inf Med 2012; 51 (02) 178-186.
  • 22 Chang YCI, Huang Y, Huang YP. Early Stopping in L2 Boosting. Comput Stat Data Anal 2010; 54 (10) 2203-2213.
  • 23 Bühlmann P, Hothorn T. Twin Boosting: Improved Feature Selection and Prediction. Statistics and Computing 2010; 20 (02) 119-138.
  • 24 Meinshausen N, Bühlmann P. Stability Selection (with Discussion). Journal of the Royal Statistical Society Series B 2010; 72: 417-473.
  • 25 Shah RD, Samworth RJ. Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013; 75 (01) 55-80.
  • 26 Hothorn T. Discussion: Stability Selection. Journal of the Royal Statistical Society Series B 2010; 72: 463-464.
  • 27 Sauerbrei W, Schumacher W. A Bootstrap Resampling Procedure for Model-Building - Application to the Cox Regression-Model. Statistics in Medicine 1992; 11: 2093-2109.
  • 28 Bühlmann P. Boosting for High-Dimensional Linear Models. The Annals of Statistics 2006; 34: 559-583.
  • 29 Schmid M, Hothorn T, Krause F, Rabe C. A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection. Statistical Applications in Genetics and Molecular Biology 2012. 11 (5)
  • 30 Schmid M, Hothorn T. Boosting Additive Models Using Component-Wise P-splines. Computational Statistics & Data Analysis 2008; 53: 298-311.
  • 31 Bühlmann P, Yu B. Boosting with the L2 Loss: Regression and Classification. Journal of the American Statistical Association 2003; 98: 324-338.
  • 32 Kneib T, Müller J, Hothorn T. Spatial Smoothing Techniques for the Assessment of Habitat Suitability. Environmental and Ecological Statistics 2008; 15: 343-364.
  • 33 Robinzonov N, Hothorn T. Boosting for Estimating Spatially Structured Additive Models. In: Kneib T, Tutz G. editors Statistical Modelling and Regression Structures Springer: 2010: 181-196.
  • 34 Sobotka F, Kneib T. Geoadditive Expectile Regression. Computational Statistics and Data Analysis 2012; 56: 755-767.
  • 35 Groll A, Tutz G. Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting. Methods Inf Med 2012; 51 (02) 168-177.
  • 36 Hofner B, Kneib T, Hothorn T. A Unified Framework of Constrained Regression. arXiv preprint. 2014. Available from: http://arxiv.org/abs/1403.7118
  • 37 Leitenstorfer F, Tutz G. Generalized Monotonic Regression based on B-splines with an Application to Air Pollution Data. Biostatistics 2007; 8 (03) 654-673.
  • 38 Hofner B, Hothorn T, Kneib T, Schmid M. A Framework for Unbiased Model Selection Based on Boosting. Journal of Computational and Graphical Statistics 2011; 20: 956-971.
  • 39 Buja A, Hastie T, Tibshirani R. Linear Smoothers and Additive Models. The Annals of Statistics 1989; 17 (02) 453-510.
  • 40 Gertheiss J, Tutz G. Penalized Regression with Ordinal Predictors. International Statistical Review 2009; 77: 345-365.
  • 41 Tutz G, Gertheiss J. Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression. Journal of Computational and Graphical Statistics 2010; 19: 154-174.
  • 42 Ferraty F, Vieu P. Additive Prediction and Boosting for Functional Data. Computational Statistics & Data Analysis 2009; 53 (04) 1400-1413.
  • 43 Gertheiss J, Hogger S, Oberhauser C, Tutz G. Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets. Applied Statistics 2010; 60 (03) 377-395.
  • 44 Tutz G, Ulbricht J. Penalized Regression with Correlation-Based Penalty. Statistical Computing 2008; 19: 239-253.
  • 45 Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B 2005; 67: 301-320.
  • 46 Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ. Survival Ensembles. Biostatistics 2006; 7 (03) 355-373.
  • 47 Ridgeway G. The State of Boosting. Computing Science and Statistics 1999; 31: 172-181.
  • 48 Binder H, Schumacher M. Allowing for Mandatory Covariates in Boosting Estimation of SparseHigh-Dimensional Survival Models. BMC Bioinformatics 2008; 9 (14) 9-14.
  • 49 Hofner B, Hothorn T, Kneib T. Variable Selection and Model Choice in Structured Survival Models. Computational Statistics 2013; 28 (03) 1079-1101.
  • 50 Binder H, Allignol A, Schumacher M, Beyersmann J. Boosting for High-Dimensional Time- to-Event Data with Competing Risks. Bioinformatics 2009; 25 (07) 890-896.
  • 51 Schmid M, Hothorn T. Flexible Boosting of Accelerated Failure Time Models. BMC Bioinformatics 2008. 9 (269)
  • 52 Schmid M, Potapov S, Pfahlberg A, Hothorn T. Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions. Statistics and Computing 2010; 20: 139-150.
  • 53 Ma S, Huang J, Xie Y, Yi N. Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting. Methods Inf Med 2012; 51 (02) 152-161.
  • 54 Johnson BA, Long Q. Survival Ensembles by the Sum of Pairwise Differences with Application to Lung Cancer Microarray Studies. The Annals of Applied Statistics 2011; 5 (02) 1081-1101.
  • 55 Wang Z, Wang C. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Statistical Applications in Genetics and Molecular Biology 2010. 9(1)
  • 56 Schmid M, Hothorn T, Maloney KO, Weller DE, Potapov S. Geoadditive Regression Modeling of Stream Biological Condition. Environmental and Ecological Statistics 2011; 18: 709-733.
  • 57 Mayr A, Fenske N, Hofner B, Kneib T, Schmid M. Generalized Additive Models for Location, Scale and Shape for High-Dimensional Data - A Flexible Aproach Based on Boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012; 61 (03) 403-427.
  • 58 Rigby RA, Stasinopoulos DM. Generalized Additive Models for Location, Scale and Shape (with discussion). Applied Statistics 2005; 54: 507-554.
  • 59 Schmid M, Wickler F, Maloney KO, Mitchell R, Fenske N, Mayr A. Boosted Beta Regression. PloS ONE 2013; 8 (04) e61623
  • 60 Kneib T. Beyond Mean Regression. Statistical Modelling 2013; 13 (04) 275-303.
  • 61 Fenske N, Kneib T, Hothorn T. Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association 2011; 106 (494) 494-510.
  • 62 Mayr A, Hothorn T, Fenske N. Prediction Intervals for Future BMI Values of Individual Children - A Non-Parametric Approach by Quantile Boosting. BMC Medical Research Methodology 2012. 12 (6)
  • 63 Hothorn T, Kneib T, Bühlmann P. Conditional Transformation Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014; 76 (01) 3-27.
  • 64 Dettling M, Bühlmann P. Boosting for Tumor Classification with Gene Expression Data. Bioinformatics 2003; 19 (09) 1061-1069.
  • 65 Boulesteix AL, Hothorn T. Testing the Additional Predictive Value of High-Dimensional Molecular Data. BMC Bioinformatics 2010; 11: 78
  • 66 Binder H, Schumacher M. Incorporating Pathway Information into Boosting Estimation of High-Dimensional Risk Prediction Models. BMC Bioinformatics 2009. 10 (18)
  • 67 Gade S, Porzelius C, Fälth M, Brase JC, Wuttig D, Kuner R. et al. Graph based Fusion of miRNA and mRNA Expression Data Improves Clinical Outcome Prediction in Prostate Cancer. BMC Bioinformatics 2011. 12 (488)
  • 68 Binder H, Benner A, Bullinger L, Schumacher M. Tailoring Sparse Multivariable Regression Techniques for Prognostic Single-Nucleotide Polymorphism Signatures. Statistics in Medicine 2013; 32 (10) 1778-1791.
  • 69 Zwiener I, Frisch B, Binder H. Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures. PLoS ONE 2014; 9 (01) e85150
  • 70 Binder H, Müller T, Schwender H, Golka K, Steffens M, Hengstler JG. et al. Cluster-localized sparse logistic regression for SNP data. Statistical Applications ind Genetics and Molecular Biology 2012. 4 (11)
  • 71 Ma S, Huang J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics 2005; 21 (24) 4356-4362.
  • 72 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011; 7 (01) 1-30.
  • 73 Steck H. Hinge Rank Loss and the Area Under the ROC Curve. In: Machine Learning: ECML 2007 Springer; 2007: 347-358.
  • 74 Wang Z. Multi-class HingeBoost. Method and Application to the Classification of Cancer Types Using Gene Expression Data. Methods Inf Med 2012; 51 (02) 162-167.
  • 75 Komori O, Eguchi S. A Boosting Method for Maximizing the Partial Area Under the ROC Curve. BMC Bioinformatics 2010. 11 (314)
  • 76 Wang Z, Chang YCI. Marker Selection via Maximizing the Partial Area Under the ROC Curve of Linear Risk Scores. Biostatistics 2011; 12 (02) 369-385.
  • 77 Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the Yield of Medical Tests. Journal of the American Medical Association 1982; 247 (18) 2543-2546.
  • 78 Heagerty PJ, Zheng Y. Survival Model Predictive Accuracy and ROC Curves. Biometrics 2005; 61 (01) 92-105.
  • 79 Chen Y, Jia Z, Mercola D, Xie X. A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index. Computational and Mathematical Methods in Medicine 2013. Available from: http://dx.doi.org/10.1155/2013/873595
  • 80 Mayr A, Schmid M. Boosting the Concordance Index for Survival Data - A Unified Framework to Derive and Evaluate Biomarker Combinations. PloS ONE 2014; 9 (01) e84483
  • 81 Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with Censored Survival Data. Statistics in Medicine 2011; 30 (10) 1105-1117.
  • 82 Schmid M, Potapov S. A Comparison of Estimators to Evaluate the Discriminatory Power of Time-to-Event Models. Statistics in Medicine 2012; 31 (23) 2588-2609.
  • 83 Reiser V, Porzelius C, Stampf S, Schumacher M, Binder H. Can Matching Improve the Performance of Boosting for Identifying Important Genes in Observational Studies?. Computational Statistics 2013; 28 (01) 37-49.
  • 84 Rücker G, Reiser V, Motschall E, Binder H, Meerpohl JJ, Antes G. et al. Boosting Qualifies Capture-Recapture Methods for Estimating the Comprehensiveness of Literature Searches for Systematic Reviews. Journal of Clinical Epidemiology 2011; 64 (12) 1364-1372.
  • 85 Fenske N, Burns J, Hothorn T, Rehfuess EA. Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression. PloS ONE 2013; 8 (11) e78692
  • 86 Faschingbauer F, Beckmann M, Goecke T, Yazdi B, Siemer J, Schmid M. et al. A New Formula for Optimized Weight Estimation in Extreme Fetal Macrosomia (≥ 4500 g). European Journal of Ultrasound 2012; 33 (05) 480-488.