Effects of Case Removal in Prognostic Models

L. Ohno-Machado; S. Vinterbo

doi:10.1055/s-0038-1634461

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Methods Inf Med 2001; 40(01): 32-38
DOI: 10.1055/s-0038-1634461

Original Article

Schattauer GmbH

Effects of Case Removal in Prognostic Models

L. Ohno-Machado

¹Brigham and Women’s Hospital and Health Sciences and Technology Division, Harvard Medical School and Massachusetts Institute of Technology, Boston, USA

,

S. Vinterbo

²Knowledge Systems Group, Dept. Computer and Information Sciences, Norwegian University of Science and Technology, Trondheim, Norway

› Author Affiliations

Further Information

Publication History

Publication Date:
08 February 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Abstract

Constructing and updating prognostic models that learn from training cases is a time-consuming task. The more compact, and yet informative, the training sets are, the faster one can build and properly evaluate such models. We have compared different regression diagnostic methods for selection and removal of training cases in prognostic models. Univariate determinations were performed using classical regression diagnostic statistics. Multivariate determinations were performed using (1) a sequential “backward” selection of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm produced final models that kept few cases and retained predictive capability. A genetic algorithm approach to case selection may be better suited for guiding removal of cases in training sets than a univariate or a sequential multivariate approach, possibly because of its ability to detect sets of cases that are influential en bloc but may not be sufficiently influential when considered in isolation.

Keywords

Variable Selection - Regression Diagnostics - Genetic Algorithm

References
1 Belsley DA, Kuh E, Weilsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley and Sons; 1980: 292-3.

Search in Google Scholar
2 Cook RD, Weisberg S. Residuals and Influence in Regression. New York: Chapman and Hall; 1982: 230-1.

Search in Google Scholar
3 Brodley CE, Friedl MA. Identifying and eliminating mislabeled training instances. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press; 1996: 799-805.

Search in Google Scholar
4 Gamberger D, Lavrac N, Dzeroski S. Noise elimination in inductive concept learning: A case study in medical diagnosis. In: Proc. of the 7th International Workshop on Algorithmic Learning Theory. Berlin: Springer; 1996: 199-212.

Search in Google Scholar
5 Atkinson AC. Plots, Transformations, and Regression. Oxford: Clarendon; 1985

Search in Google Scholar
6 Braithwaite IJ, Boot DA, Patterson M, Robinson A. Disability after severe injury: five year follow-up of a large cohort. Injury 1998; 29 (Suppl. 01) 55-9.

Crossref PubMed Search in Google Scholar
7 Clark DE, Ryan LM. Modeling injury outcomes using time-to-event methods. J Trauma 1997; 42 (Suppl. 06) 1129-34.

Crossref PubMed Search in Google Scholar
8 Wyatt JP, Beard D, Busuttil A. Quantifying injury and predicting outcome after trauma. Forensic Sci Int 1998; 95 (Suppl. 01) 57-66.

Crossref PubMed Search in Google Scholar
9 Osler T, Baker SP, Long W. A modification of the injury severity score that both improves accuracy and simplifies scoring. J Trauma 1997; 43 (Suppl. 06) 922-5.

Crossref PubMed Search in Google Scholar
10 Christensen R. Log-Linear Models and Logistic Regression. New York: Springer; 1997

Search in Google Scholar
11 Bedrick EJ, Christensen R, Johnson W. Bayesian binomial regression: Predicting survival at a trauma center. Ame Stat 1997; 51: 211-8.

PubMed Search in Google Scholar
12 Kennedy RL, Burton AM. et al. Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur Heart J 1996; 17 (Suppl. 08) 1181-91.

Crossref PubMed Search in Google Scholar
13 Dreiseitl S, Ohno-Machado L, Vinterbo S. Evaluating Variable Selection Methods for Diagnosis of Myocardial Infarction. Proceedings of the 1999 American Medical Informatics Association Fall Meeting. (in press).

PubMed
14 Ohno-Machado L, Fraser HS, Øhrn A. Improving Machine Learning Performance by Removing Redundant Cases in Medical Data Sets. J Am Med Inform Assoc 1998; Suppl 5: 523-7.

PubMed Search in Google Scholar
15 The LOGISTIC Procedure.. In: SAS/STAT User’s Guide. Cary: SAS Institute; 1990

Search in Google Scholar
16 Mitchell M. An Introduction to Genetic Algorithms. Cambridge: MIT Press; 1996

Search in Google Scholar
17 Vinterbo S, Ohno-Machado L. A Genetic Algorithm to Select Variables in Logistic Regression: Example in Myocardial Infarction. Proceedings of the 1999 American Medical Informatics Association Fall Meeting. (in press).

PubMed
18 Ohno-Machado L, Fraser HS, Øhrn A. Improving Machine Learning Performance by Removing Redundant Cases in Medical Data Sets. J Am Med Inform Assoc 1998; Suppl 5: 523-7.

PubMed Search in Google Scholar

Subscribe to RSS

Share / Bookmark

Effects of Case Removal in Prognostic Models

Publication History

Abstract

Keywords

References