Methods Inf Med 2012; 51(04): 341-347
DOI: 10.3414/ME11-02-0045
Focus Theme – Original Articles
Schattauer GmbH

Supporting Regenerative Medicine by Integrative Dimensionality Reduction

F. Mulas
1   Centre for Tissue Engineering, University of Pavia, Pavia, Italy
,
L. Zagar
2   Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
,
B. Zupan
2   Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
1   Centre for Tissue Engineering, University of Pavia, Pavia, Italy
3   Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
,
R. Bellazzi
4   Dipartimento di Ingegneria Industriale e dell’Informazione, Università di Pavia, Pavia, Italy
1   Centre for Tissue Engineering, University of Pavia, Pavia, Italy
› Author Affiliations
Further Information

Publication History

received:08 November 2011

accepted:04 May 2012

Publication Date:
20 January 2018 (online)

Summary

Objective: The assessment of the developmental potential of stem cells is a crucial step towards their clinical application in regenerative medicine. It has been demonstrated that genome-wide expression profiles can predict the cellular differentiation stage by means of dimensionality reduction methods. Here we show that these techniques can be further strengthened to support decision making with i) a novel strategy for gene selection; ii) methods for combining the evidence from multiple data sets.

Methods: We propose to exploit dimensionality reduction methods for the selection of genes specifically activated in different stages of differentiation. To obtain an integrated predictive model, the expression values of the selected genes from multiple data sets are combined. We investigated distinct approaches that either aggregate data sets or use learning ensembles.

Results: We analyzed the performance of the proposed methods on six publicly available data sets. The selection procedure identified a reduced subset of genes whose expression values gave rise to an accurate stage prediction. The assessment of predictive accuracy demonstrated a high quality of predictions for most of the data integration methods presented.

Conclusion: The experimental results highlighted the main potentials of proposed approaches. These include the ability to predict the true staging by combining multiple training data sets when this could not be inferred from a single data source, and to focus the analysis on a reduced list of genes of similar predictive performance.

 
  • References

  • 1 Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 2006; 126 (04) 663-676.
  • 2 Cartier N, Hacein-Bey-Abina S, Bartholomae CC, Veres G, Schmidt M, Kutschera I, Vidaud M, Abel U, Dal-Cortivo L, Caccavelli L, Mahlaoui N, Kiermer V, Mittelstaedt D, Bellesme C, Lahlou N, Lefrere F, Blanche S, Audit M, Payen E, Leboulch P, l’Homme B, Bougneres P, Von Kalle C, Fischer A, Cavazzana-Calvo M, Aubourg P. Hematopoietic stem cell gene therapy with a lentiviral vector in x-linked adrenoleukodystrophy. Science 2009; 326 5954 818-823.
  • 3 Sun XY, Nong J, Qin K, Warnock GL, Dai LJ. Mesenchymal stem cell-mediated cancer therapy: A dual-targeted strategy of personalized medicine. World J Stem Cells 2011; 3 (11) 96-103.
  • 4 Giordano FA, Hotz-Wagenblatt A, Lauterborn D, Appelt JU, Fellenberg K, Nagy KZ, Zeller WJ, Suhai S, Fruehauf S, Laufs S. New bioinformatic strategies to rapidly characterize retroviral integration sites of gene therapy vectors. Methods Inf Med 2007; 46 (05) 542-547.
  • 5 Okita K, Yamanaka S. Induced pluripotent stem cells: opportunities and challenges. Philos Trans R Soc Lond B Biol Sci 2011; 366 1575 2198-2207.
  • 6 Daley GQ, Lensch MW, Jaenisch R, Meissner A, Plath K, Yamanaka S. Broader implications of defining standards for the pluripotency of iPSCs. Cell Stem Cell 2009; 4 (03) 200-1. author reply 02
  • 7 Maojo V, Martin-Sanchez F. Bioinformatics: towards new directions for public health. Methods Inf Med 2004; 43 (03) 208-214.
  • 8 Bicciato S, Luchini A, Di Bello C. Marker identification and classification of cancer types using gene expression data and SIMCA. Methods Inf Med 2004; 43 (01) 4-8.
  • 9 Cannistraci CV, Ravasi T, Montevecchi FM, Ideker T, Alessio M. Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 2010; 26 (18) i531-9.
  • 10 Aiba K, Nedorezov T, Piao Y, Nishiyama A, Matoba R, Sharova LV, Sharov AA, Yamanaka S, Niwa H, Ko MS. Defining developmental potency and cell lineage trajectories by expression profiling of differentiating mouse embryonic stem cells. DNA Res 2009; 16 (01) 73-80.
  • 11 Muller FJ, Schuldt BM, Williams R, Mason D, Altun G, Papapetrou EP, Danner S, Goldmann JE, Herbst A, Schmidt NO, Aldenhoff JB, Laurent LC, Loring JF. A bioinformatic assay for pluripotency in human cells. Nat Methods 2011; 8 (04) 315-317.
  • 12 Zagar L, Mulas F, Garagna S, Zuccotti M, Bellazzi R, Zupan B. Stage prediction of embryonic stem cell differentiation from genome-wide expression data. Bioinformatics 2011; 27 (18) 2546-2553.
  • 13 Park T, Yi SG, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS. Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 2003; 19 (06) 694-703.
  • 14 Dudley JT, Tibshirani R, Deshpande T, Butte AJ. Disease signatures are robust across tissues and experiments. Mol Syst Biol 2009; 5: 307
  • 15 Di Camillo B, Toffolo G, Nair SK, Greenlund LJ, Cobelli C. Significance analysis of microarray transcript levels in time series experiments. BMC Bioinformatics 2007; 8 (Suppl. 01) S10
  • 16 Severgnini M, Bicciato S, Mangano E, Scarlatti F, Mezzelani A, Mattioli M, Ghidoni R, Peano C, Bonnal R, Viti F, Milanesi L, De Bellis G, Battaglia C. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment. Anal Biochem 2006; 353 (01) 43-56.
  • 17 Boes T, Neuhauser M. Normalization for Affymetrix GeneChips. Methods Inf Med 2005; 44 (03) 414-417.
  • 18 Choi H, Shen R, Chinnaiyan AM, Ghosh D. A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinformatics 2007; 8: 364
  • 19 Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19 (02) 185-193.
  • 20 Xu L, Tan AC, Winslow RL, Geman D. Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinformatics 2008; 9: 125
  • 21 Bauer E, Kohavi R. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning 1999; 36 (1-2) 105-139.
  • 22 Stollhoff R, Sauerbrei W, Schumacher M. An experimental evaluation of boosting methods for classification. Methods Inf Med 2010; 49 (03) 219-229.
  • 23 Yang J, Zhang D, Frangi AF, Yang JY. Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 2004; 26 (01) 131-137.
  • 24 Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143 (01) 29-36.
  • 25 Hailesellasse Sene K, Porter CJ, Palidwor G, Perez-Iratxeta C, Muro EM, Campbell PA, Rudnicki MA, Andrade-Navarro MA. Gene function in early mouse embryonic stem cell differentiation. BMC Genomics 2007; 8: 85
  • 26 Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 2006; 7: 1-30.
  • 27 Mikkelsen TS, Hanna J, Zhang X, Ku M, Wernig M, Schorderet P, Bernstein BE, Jaenisch R, Lander ES, Meissner A. Dissecting direct reprogramming through integrative genomic analysis. Nature 2008; 454 7200 49-55.