Subscribe to RSS
DOI: 10.1055/s-0038-1634461
Effects of Case Removal in Prognostic Models
Publication History
Publication Date:
08 February 2018 (online)


Abstract
Constructing and updating prognostic models that learn from training cases is a time-consuming task. The more compact, and yet informative, the training sets are, the faster one can build and properly evaluate such models. We have compared different regression diagnostic methods for selection and removal of training cases in prognostic models. Univariate determinations were performed using classical regression diagnostic statistics. Multivariate determinations were performed using (1) a sequential “backward” selection of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm produced final models that kept few cases and retained predictive capability. A genetic algorithm approach to case selection may be better suited for guiding removal of cases in training sets than a univariate or a sequential multivariate approach, possibly because of its ability to detect sets of cases that are influential en bloc but may not be sufficiently influential when considered in isolation.