Methods Inf Med 2001; 40(01): 32-38
DOI: 10.1055/s-0038-1634461
Original Article
Schattauer GmbH

Effects of Case Removal in Prognostic Models

L. Ohno-Machado
1   Brigham and Women’s Hospital and Health Sciences and Technology Division, Harvard Medical School and Massachusetts Institute of Technology, Boston, USA
,
S. Vinterbo
2   Knowledge Systems Group, Dept. Computer and Information Sciences, Norwegian University of Science and Technology, Trondheim, Norway
› Author Affiliations
Further Information

Publication History

Publication Date:
08 February 2018 (online)

Zoom Image

Abstract

Constructing and updating prognostic models that learn from training cases is a time-consuming task. The more compact, and yet informative, the training sets are, the faster one can build and properly evaluate such models. We have compared different regression diagnostic methods for selection and removal of training cases in prognostic models. Univariate determinations were performed using classical regression diagnostic statistics. Multivariate determinations were performed using (1) a sequential “backward” selection of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm produced final models that kept few cases and retained predictive capability. A genetic algorithm approach to case selection may be better suited for guiding removal of cases in training sets than a univariate or a sequential multivariate approach, possibly because of its ability to detect sets of cases that are influential en bloc but may not be sufficiently influential when considered in isolation.

Crossref Cited-by logo
Article Citations