Extending Statistical Boosting

A. Mayr; H. Binder; O. Gefeller; M. Schmid

doi:10.3414/ME13-01-0123

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2014; 53(06): 428-435
DOI: 10.3414/ME13-01-0123

Original Articles

Schattauer GmbH

Extending Statistical Boosting

An Overview of Recent Methodological Developments

Authors

A. Mayr

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
H. Binder

²Institut für Medizinische Biometrie, Epidemiologie und Informatik, Johannes Gutenberg-Universität Mainz, Germany
O. Gefeller

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
M. Schmid

¹Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

³Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany

Further Information

Publication History

received: 11 November 2013

accepted: 02 May 2014

Publication Date:
20 January 2018 (online)

Permissions and Reprints

Summary

Background: Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade.

Objectives: This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research.

Methods: We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now.

Results: The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings.

Conclusions: Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.

Keywords

Statistical computing - statistical models - algorithms - classification - machine learning

References
1 Schapire RE. The Strength of Weak Learnability. Machine Learning 1990; 5 (02) 197-227.

Search in Google Scholar
Download RIS citation
2 Freund Y. Boosting a Weak Learning Algorithm by Majority. In: Fulk MA, Case J. editors Proceedings of the Third Annual Workshop on Computa-tional Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8. 1990; 1990: 202-216.

Search in Google Scholar
Download RIS citation
3 Freund Y, Schapire R. Experiments With a New Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: San Francisco: Morgan Kaufmann Publishers Inc.; 1996: 148-156.

Search in Google Scholar
Download RIS citation
4 Mayr A, Binder H, Gefeller O, Schmid M. The Evolution of Boosting Algorithms - From Machine Learning to Statistical Modelling. Methods Inf Med 2014; 53: 419-427.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
5 Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990

Search in Google Scholar
Download RIS citation
6 Eilers PHC, Marx BD. Flexible Smoothing with B-splines and Penalties (with discussion). Statistical Science 1996; 11: 89-121.

Crossref Search in Google Scholar
Download RIS citation
7 Kruppa J, Liu Y, Biau G, Kohler M, König IR, Malley JD. et al. Probability Estimation with Machine Learning Methods for Dichotomous and Multi-Category Outcome: Theory. Biometrical Journal 2014. Available from: http://dx.doi.org/10.1002/bimj.201300068

Crossref PubMed
Download RIS citation
8 Boulesteix AL, Schmid M. Discussion: Machine Learning Versus Statistical Modeling. Biometrical Journal. 2014. Available from: http://dx.doi.org/10.1002/bimj.201300226

Crossref PubMed
Download RIS citation
9 Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001; 29: 1189-1232.

Search in Google Scholar
Download RIS citation
10 Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science 2007; 22: 477-522.

Crossref Search in Google Scholar
Download RIS citation
11 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Tutz G, Binder H. Boosting Ridge Regression. Computational Statistics & Data Analysis 2007; 51 (12) 6044-6059.

Crossref Search in Google Scholar
Download RIS citation
13 Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics 2009; 65 (02) 626-634.

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Fan J, Lv J. A Selective Overview of Variable Selection in High Dimensional Feature Space. Statistica Sinica 2010; 20: 101-148.

PubMed Search in Google Scholar
Download RIS citation
15 Bühlmann P, Yu B. Sparse Boosting. Journal of Machine Learning Research 2007; 7: 1001-1024.

Search in Google Scholar
Download RIS citation
16 Bühlmann P. Bagging, Boosting and Ensemble Methods. In: Gentle YJE, Härdle W, Mori Y. editors Handbook of Computational Statistics Springer Handbooks: 2012: 985-1022.

Search in Google Scholar
Download RIS citation
17 Hansen MH, Yu B. Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association 2001; 96 (454) 746-774.

Crossref Search in Google Scholar
Download RIS citation
18 Hurvich CM, Tsai CL. Regression and Time Series Model Selection in Small Samples. Biometrika 1989; 76 (02) 237-397.

Search in Google Scholar
Download RIS citation
19 Greven S, Kneib T. On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models. Biometrika 2010; 97 (04) 773-789.

Crossref Search in Google Scholar
Download RIS citation
20 Hastie T. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22 (04) 513-515.

Crossref Search in Google Scholar
Download RIS citation
21 Mayr A, Hofner B, Schmid M. The Importance of Knowing When to Stop - A Sequential Stopping Rule for Component-Wise Gradient Boosting. Methods Inf Med 2012; 51 (02) 178-186.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
22 Chang YCI, Huang Y, Huang YP. Early Stopping in L2 Boosting. Comput Stat Data Anal 2010; 54 (10) 2203-2213.

Crossref Search in Google Scholar
Download RIS citation
23 Bühlmann P, Hothorn T. Twin Boosting: Improved Feature Selection and Prediction. Statistics and Computing 2010; 20 (02) 119-138.

Crossref Search in Google Scholar
Download RIS citation
24 Meinshausen N, Bühlmann P. Stability Selection (with Discussion). Journal of the Royal Statistical Society Series B 2010; 72: 417-473.

Search in Google Scholar
Download RIS citation
25 Shah RD, Samworth RJ. Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013; 75 (01) 55-80.

Search in Google Scholar
Download RIS citation
26 Hothorn T. Discussion: Stability Selection. Journal of the Royal Statistical Society Series B 2010; 72: 463-464.

Search in Google Scholar
Download RIS citation
27 Sauerbrei W, Schumacher W. A Bootstrap Resampling Procedure for Model-Building - Application to the Cox Regression-Model. Statistics in Medicine 1992; 11: 2093-2109.

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Bühlmann P. Boosting for High-Dimensional Linear Models. The Annals of Statistics 2006; 34: 559-583.

Crossref Search in Google Scholar
Download RIS citation
29 Schmid M, Hothorn T, Krause F, Rabe C. A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection. Statistical Applications in Genetics and Molecular Biology 2012. 11 (5)

Search in Google Scholar
Download RIS citation
30 Schmid M, Hothorn T. Boosting Additive Models Using Component-Wise P-splines. Computational Statistics & Data Analysis 2008; 53: 298-311.

Crossref Search in Google Scholar
Download RIS citation
31 Bühlmann P, Yu B. Boosting with the L2 Loss: Regression and Classification. Journal of the American Statistical Association 2003; 98: 324-338.

Crossref Search in Google Scholar
Download RIS citation
32 Kneib T, Müller J, Hothorn T. Spatial Smoothing Techniques for the Assessment of Habitat Suitability. Environmental and Ecological Statistics 2008; 15: 343-364.

Crossref Search in Google Scholar
Download RIS citation
33 Robinzonov N, Hothorn T. Boosting for Estimating Spatially Structured Additive Models. In: Kneib T, Tutz G. editors Statistical Modelling and Regression Structures Springer: 2010: 181-196.

Search in Google Scholar
Download RIS citation
34 Sobotka F, Kneib T. Geoadditive Expectile Regression. Computational Statistics and Data Analysis 2012; 56: 755-767.

Crossref Search in Google Scholar
Download RIS citation
35 Groll A, Tutz G. Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting. Methods Inf Med 2012; 51 (02) 168-177.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
36 Hofner B, Kneib T, Hothorn T. A Unified Framework of Constrained Regression. arXiv preprint. 2014. Available from: http://arxiv.org/abs/1403.7118

Download RIS citation
37 Leitenstorfer F, Tutz G. Generalized Monotonic Regression based on B-splines with an Application to Air Pollution Data. Biostatistics 2007; 8 (03) 654-673.

Crossref PubMed Search in Google Scholar
Download RIS citation
38 Hofner B, Hothorn T, Kneib T, Schmid M. A Framework for Unbiased Model Selection Based on Boosting. Journal of Computational and Graphical Statistics 2011; 20: 956-971.

Crossref Search in Google Scholar
Download RIS citation
39 Buja A, Hastie T, Tibshirani R. Linear Smoothers and Additive Models. The Annals of Statistics 1989; 17 (02) 453-510.

Crossref Search in Google Scholar
Download RIS citation
40 Gertheiss J, Tutz G. Penalized Regression with Ordinal Predictors. International Statistical Review 2009; 77: 345-365.

Crossref Search in Google Scholar
Download RIS citation
41 Tutz G, Gertheiss J. Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression. Journal of Computational and Graphical Statistics 2010; 19: 154-174.

Crossref Search in Google Scholar
Download RIS citation
42 Ferraty F, Vieu P. Additive Prediction and Boosting for Functional Data. Computational Statistics & Data Analysis 2009; 53 (04) 1400-1413.

Crossref Search in Google Scholar
Download RIS citation
43 Gertheiss J, Hogger S, Oberhauser C, Tutz G. Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets. Applied Statistics 2010; 60 (03) 377-395.

Search in Google Scholar
Download RIS citation
44 Tutz G, Ulbricht J. Penalized Regression with Correlation-Based Penalty. Statistical Computing 2008; 19: 239-253.

Search in Google Scholar
Download RIS citation
45 Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B 2005; 67: 301-320.

Search in Google Scholar
Download RIS citation
46 Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ. Survival Ensembles. Biostatistics 2006; 7 (03) 355-373.

PubMed Search in Google Scholar
Download RIS citation
47 Ridgeway G. The State of Boosting. Computing Science and Statistics 1999; 31: 172-181.

Search in Google Scholar
Download RIS citation
48 Binder H, Schumacher M. Allowing for Mandatory Covariates in Boosting Estimation of SparseHigh-Dimensional Survival Models. BMC Bioinformatics 2008; 9 (14) 9-14.

Crossref PubMed Search in Google Scholar
Download RIS citation
49 Hofner B, Hothorn T, Kneib T. Variable Selection and Model Choice in Structured Survival Models. Computational Statistics 2013; 28 (03) 1079-1101.

Crossref Search in Google Scholar
Download RIS citation
50 Binder H, Allignol A, Schumacher M, Beyersmann J. Boosting for High-Dimensional Time- to-Event Data with Competing Risks. Bioinformatics 2009; 25 (07) 890-896.

Crossref PubMed Search in Google Scholar
Download RIS citation
51 Schmid M, Hothorn T. Flexible Boosting of Accelerated Failure Time Models. BMC Bioinformatics 2008. 9 (269)

Search in Google Scholar
Download RIS citation
52 Schmid M, Potapov S, Pfahlberg A, Hothorn T. Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions. Statistics and Computing 2010; 20: 139-150.

Crossref Search in Google Scholar
Download RIS citation
53 Ma S, Huang J, Xie Y, Yi N. Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting. Methods Inf Med 2012; 51 (02) 152-161.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
54 Johnson BA, Long Q. Survival Ensembles by the Sum of Pairwise Differences with Application to Lung Cancer Microarray Studies. The Annals of Applied Statistics 2011; 5 (02) 1081-1101.

Crossref PubMed Search in Google Scholar
Download RIS citation
55 Wang Z, Wang C. Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Statistical Applications in Genetics and Molecular Biology 2010. 9(1)

Search in Google Scholar
Download RIS citation
56 Schmid M, Hothorn T, Maloney KO, Weller DE, Potapov S. Geoadditive Regression Modeling of Stream Biological Condition. Environmental and Ecological Statistics 2011; 18: 709-733.

Crossref Search in Google Scholar
Download RIS citation
57 Mayr A, Fenske N, Hofner B, Kneib T, Schmid M. Generalized Additive Models for Location, Scale and Shape for High-Dimensional Data - A Flexible Aproach Based on Boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012; 61 (03) 403-427.

Crossref Search in Google Scholar
Download RIS citation
58 Rigby RA, Stasinopoulos DM. Generalized Additive Models for Location, Scale and Shape (with discussion). Applied Statistics 2005; 54: 507-554.

Search in Google Scholar
Download RIS citation
59 Schmid M, Wickler F, Maloney KO, Mitchell R, Fenske N, Mayr A. Boosted Beta Regression. PloS ONE 2013; 8 (04) e61623

Crossref PubMed Search in Google Scholar
Download RIS citation
60 Kneib T. Beyond Mean Regression. Statistical Modelling 2013; 13 (04) 275-303.

Crossref Search in Google Scholar
Download RIS citation
61 Fenske N, Kneib T, Hothorn T. Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association 2011; 106 (494) 494-510.

Crossref Search in Google Scholar
Download RIS citation
62 Mayr A, Hothorn T, Fenske N. Prediction Intervals for Future BMI Values of Individual Children - A Non-Parametric Approach by Quantile Boosting. BMC Medical Research Methodology 2012. 12 (6)

Search in Google Scholar
Download RIS citation
63 Hothorn T, Kneib T, Bühlmann P. Conditional Transformation Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014; 76 (01) 3-27.

Search in Google Scholar
Download RIS citation
64 Dettling M, Bühlmann P. Boosting for Tumor Classification with Gene Expression Data. Bioinformatics 2003; 19 (09) 1061-1069.

Crossref PubMed Search in Google Scholar
Download RIS citation
65 Boulesteix AL, Hothorn T. Testing the Additional Predictive Value of High-Dimensional Molecular Data. BMC Bioinformatics 2010; 11: 78

Crossref PubMed Search in Google Scholar
Download RIS citation
66 Binder H, Schumacher M. Incorporating Pathway Information into Boosting Estimation of High-Dimensional Risk Prediction Models. BMC Bioinformatics 2009. 10 (18)

Search in Google Scholar
Download RIS citation
67 Gade S, Porzelius C, Fälth M, Brase JC, Wuttig D, Kuner R. et al. Graph based Fusion of miRNA and mRNA Expression Data Improves Clinical Outcome Prediction in Prostate Cancer. BMC Bioinformatics 2011. 12 (488)

Search in Google Scholar
Download RIS citation
68 Binder H, Benner A, Bullinger L, Schumacher M. Tailoring Sparse Multivariable Regression Techniques for Prognostic Single-Nucleotide Polymorphism Signatures. Statistics in Medicine 2013; 32 (10) 1778-1791.

Crossref PubMed Search in Google Scholar
Download RIS citation
69 Zwiener I, Frisch B, Binder H. Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures. PLoS ONE 2014; 9 (01) e85150

Crossref PubMed Search in Google Scholar
Download RIS citation
70 Binder H, Müller T, Schwender H, Golka K, Steffens M, Hengstler JG. et al. Cluster-localized sparse logistic regression for SNP data. Statistical Applications ind Genetics and Molecular Biology 2012. 4 (11)

Search in Google Scholar
Download RIS citation
71 Ma S, Huang J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics 2005; 21 (24) 4356-4362.

Crossref PubMed Search in Google Scholar
Download RIS citation
72 Wang Z. HingeBoost: ROC-Based Boost for Classification and Variable Selection. The International Journal of Biostatistics 2011; 7 (01) 1-30.

PubMed Search in Google Scholar
Download RIS citation
73 Steck H. Hinge Rank Loss and the Area Under the ROC Curve. In: Machine Learning: ECML 2007 Springer; 2007: 347-358.

Search in Google Scholar
Download RIS citation
74 Wang Z. Multi-class HingeBoost. Method and Application to the Classification of Cancer Types Using Gene Expression Data. Methods Inf Med 2012; 51 (02) 162-167.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
75 Komori O, Eguchi S. A Boosting Method for Maximizing the Partial Area Under the ROC Curve. BMC Bioinformatics 2010. 11 (314)

Search in Google Scholar
Download RIS citation
76 Wang Z, Chang YCI. Marker Selection via Maximizing the Partial Area Under the ROC Curve of Linear Risk Scores. Biostatistics 2011; 12 (02) 369-385.

Crossref PubMed Search in Google Scholar
Download RIS citation
77 Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the Yield of Medical Tests. Journal of the American Medical Association 1982; 247 (18) 2543-2546.

Crossref PubMed Search in Google Scholar
Download RIS citation
78 Heagerty PJ, Zheng Y. Survival Model Predictive Accuracy and ROC Curves. Biometrics 2005; 61 (01) 92-105.

Crossref PubMed Search in Google Scholar
Download RIS citation
79 Chen Y, Jia Z, Mercola D, Xie X. A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index. Computational and Mathematical Methods in Medicine 2013. Available from: http://dx.doi.org/10.1155/2013/873595

Crossref PubMed
Download RIS citation
80 Mayr A, Schmid M. Boosting the Concordance Index for Survival Data - A Unified Framework to Derive and Evaluate Biomarker Combinations. PloS ONE 2014; 9 (01) e84483

Crossref PubMed Search in Google Scholar
Download RIS citation
81 Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with Censored Survival Data. Statistics in Medicine 2011; 30 (10) 1105-1117.

PubMed Search in Google Scholar
Download RIS citation
82 Schmid M, Potapov S. A Comparison of Estimators to Evaluate the Discriminatory Power of Time-to-Event Models. Statistics in Medicine 2012; 31 (23) 2588-2609.

Crossref PubMed Search in Google Scholar
Download RIS citation
83 Reiser V, Porzelius C, Stampf S, Schumacher M, Binder H. Can Matching Improve the Performance of Boosting for Identifying Important Genes in Observational Studies?. Computational Statistics 2013; 28 (01) 37-49.

Crossref Search in Google Scholar
Download RIS citation
84 Rücker G, Reiser V, Motschall E, Binder H, Meerpohl JJ, Antes G. et al. Boosting Qualifies Capture-Recapture Methods for Estimating the Comprehensiveness of Literature Searches for Systematic Reviews. Journal of Clinical Epidemiology 2011; 64 (12) 1364-1372.

Crossref PubMed Search in Google Scholar
Download RIS citation
85 Fenske N, Burns J, Hothorn T, Rehfuess EA. Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression. PloS ONE 2013; 8 (11) e78692

Crossref PubMed Search in Google Scholar
Download RIS citation
86 Faschingbauer F, Beckmann M, Goecke T, Yazdi B, Siemer J, Schmid M. et al. A New Formula for Optimized Weight Estimation in Extreme Fetal Macrosomia (≥ 4500 g). European Journal of Ultrasound 2012; 33 (05) 480-488.

PubMed Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Extending Statistical Boosting

Authors

Publication History

Summary

Keywords

References