Effects of Case Removal in Prognostic Models

L. Ohno-Machado; S. Vinterbo

doi:10.1055/s-0038-1634461

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook X Linkedin Weibo

PDF herunterladen

Methods Inf Med 2001; 40(01): 32-38
DOI: 10.1055/s-0038-1634461

Original Article

Schattauer GmbH

Effects of Case Removal in Prognostic Models

L. Ohno-Machado

¹Brigham and Women’s Hospital and Health Sciences and Technology Division, Harvard Medical School and Massachusetts Institute of Technology, Boston, USA

,

S. Vinterbo

²Knowledge Systems Group, Dept. Computer and Information Sciences, Norwegian University of Science and Technology, Trondheim, Norway

› Institutsangaben

Weitere Informationen

Publikationsverlauf

Publikationsdatum:
08. Februar 2018 (online)

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Abstract

Constructing and updating prognostic models that learn from training cases is a time-consuming task. The more compact, and yet informative, the training sets are, the faster one can build and properly evaluate such models. We have compared different regression diagnostic methods for selection and removal of training cases in prognostic models. Univariate determinations were performed using classical regression diagnostic statistics. Multivariate determinations were performed using (1) a sequential “backward” selection of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm produced final models that kept few cases and retained predictive capability. A genetic algorithm approach to case selection may be better suited for guiding removal of cases in training sets than a univariate or a sequential multivariate approach, possibly because of its ability to detect sets of cases that are influential en bloc but may not be sufficiently influential when considered in isolation.

Keywords

Variable Selection - Regression Diagnostics - Genetic Algorithm

References
1 Belsley DA, Kuh E, Weilsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley and Sons; 1980: 292-3.

Suche in Google Scholar
2 Cook RD, Weisberg S. Residuals and Influence in Regression. New York: Chapman and Hall; 1982: 230-1.

Suche in Google Scholar
3 Brodley CE, Friedl MA. Identifying and eliminating mislabeled training instances. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press; 1996: 799-805.

Suche in Google Scholar
4 Gamberger D, Lavrac N, Dzeroski S. Noise elimination in inductive concept learning: A case study in medical diagnosis. In: Proc. of the 7th International Workshop on Algorithmic Learning Theory. Berlin: Springer; 1996: 199-212.

Suche in Google Scholar
5 Atkinson AC. Plots, Transformations, and Regression. Oxford: Clarendon; 1985

Suche in Google Scholar
6 Braithwaite IJ, Boot DA, Patterson M, Robinson A. Disability after severe injury: five year follow-up of a large cohort. Injury 1998; 29 (Suppl. 01) 55-9.

Crossref PubMed Suche in Google Scholar
7 Clark DE, Ryan LM. Modeling injury outcomes using time-to-event methods. J Trauma 1997; 42 (Suppl. 06) 1129-34.

Crossref PubMed Suche in Google Scholar
8 Wyatt JP, Beard D, Busuttil A. Quantifying injury and predicting outcome after trauma. Forensic Sci Int 1998; 95 (Suppl. 01) 57-66.

Crossref PubMed Suche in Google Scholar
9 Osler T, Baker SP, Long W. A modification of the injury severity score that both improves accuracy and simplifies scoring. J Trauma 1997; 43 (Suppl. 06) 922-5.

Crossref PubMed Suche in Google Scholar
10 Christensen R. Log-Linear Models and Logistic Regression. New York: Springer; 1997

Suche in Google Scholar
11 Bedrick EJ, Christensen R, Johnson W. Bayesian binomial regression: Predicting survival at a trauma center. Ame Stat 1997; 51: 211-8.

PubMed Suche in Google Scholar
12 Kennedy RL, Burton AM. et al. Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur Heart J 1996; 17 (Suppl. 08) 1181-91.

Crossref PubMed Suche in Google Scholar
13 Dreiseitl S, Ohno-Machado L, Vinterbo S. Evaluating Variable Selection Methods for Diagnosis of Myocardial Infarction. Proceedings of the 1999 American Medical Informatics Association Fall Meeting. (in press).

PubMed
14 Ohno-Machado L, Fraser HS, Øhrn A. Improving Machine Learning Performance by Removing Redundant Cases in Medical Data Sets. J Am Med Inform Assoc 1998; Suppl 5: 523-7.

PubMed Suche in Google Scholar
15 The LOGISTIC Procedure.. In: SAS/STAT User’s Guide. Cary: SAS Institute; 1990

Suche in Google Scholar
16 Mitchell M. An Introduction to Genetic Algorithms. Cambridge: MIT Press; 1996

Suche in Google Scholar
17 Vinterbo S, Ohno-Machado L. A Genetic Algorithm to Select Variables in Logistic Regression: Example in Myocardial Infarction. Proceedings of the 1999 American Medical Informatics Association Fall Meeting. (in press).

PubMed
18 Ohno-Machado L, Fraser HS, Øhrn A. Improving Machine Learning Performance by Removing Redundant Cases in Medical Data Sets. J Am Med Inform Assoc 1998; Suppl 5: 523-7.

PubMed Suche in Google Scholar

RSS-Feed abonnieren

Teilen / Bookmarken

Effects of Case Removal in Prognostic Models

Publikationsverlauf

Abstract

Keywords

References