Subscribe to RSS
DOI: 10.3414/ME11-02-0043
Application of Microarray Analysis on Computer Cluster and Cloud Platforms[*]
Publication History
Received:
07 November 2011
accepted:
05 March 2012
Publication Date:
20 January 2018 (online)
Summary
Background: Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services.
Objectives: In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources.
Methods: In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms.
Results: Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the paralleli -zation is comparable in efficiency to standard computer cluster implementations.
Conclusion: Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Keywords
Biostatistics - cloud computing - computing methodologies - microarray analysis - statistical computing* Supplementary material published on our website www.methods-online.com
-
References
- 1 Amazon Web Services (2011): Amazon Elastic Compute Cloud (EC2). Available. http://aws.amazon.com/ec2. (accessed: Jan 20, 2012)
- 2 Bioconductor (2011): Bioconductor - Cloud AMI. Available. http://www.bioconductor.org/help/bioconductor-cloud-ami/. (accessed: Oct 28, 2011)
- 3 R development Core Team (2011): R: A Language and Environment for Statistical Computing. Available. http://www.R-project.org/. (accessed: 2012 Jan 14)
- 4 Wikimedia Foundation. Wikipedia (2012): Utility Computing. Available. http://en.wikipedia.org/wiki/Utility_computing. (accessed: Jan 29, 2012)
- 5 Knaus J, Hieke S, Binder H, Schwarzer G. Costs of Cloud Computing for a Biometry Department - A Case Study. Methods Inf Med 2013; 52: 72-79.
- 6 Standard Performance Evaluation Corporation (2011): Spec CPU [online]. Available. http://www.spec.org/benchmarks.html. (accessed: 2011 Oct 28)
- 7 Coffey P, Beliveau J, Mogre N, Harner A. Benchmarking the Amazon Elastic Compute Cloud (EC2) [online]. Available. 2011 http://www.wpi.edu.Pubs/E-project/Available/E-project-030811-115350/unrestricted/AmaznEC2_MQP_Final.pdf. (accessed: Oct 23, 2011)
- 8 Mandelbrot BB. The fractal geometry of nature. New York: W.H. Freeman and Company; 2003.
- 9 Evangelinos C, Hill CN. Cloud Computing for parallel Scientific HPC Applications: Feasibility of running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2. Proceedings of CCA-08.2008.
- 10 Citrix Systems (2012): Xen. Available. http://xen.org/. (accessed Jan 25, 2012)
- 11 Yu H. 2010. Rmpi: Interface (Wrapper) to MPI (Message-Passing Interface). R package version 0.5–9 (online). Available. http://CRAN.R-project.org/package=Rmpi. (accessed: Oct 28, 2011)
- 12 Gautier L, Cope L, Bolstad B M, Irizarry RA. affyanalysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004; 20 (03) 307-315.
- 13 Hastie T, Tibshirani R, Friedman H. The Elements of Statistical Learning. New York: Springer-Verlag; 2001.
- 14 Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006; 7: 91
- 15 Bernau C, Augustin T, Boulesteix A-L. Correcting the optimally selected resampling-based error rate: A smooth analytical alternative to nested cross-validation. Department of Statistics: Technical Reports, Nr. 105, 2011. Available. http://epub.ub.uni-muenchen.de/12231/. (accessed: Jan 29, 2012)
- 16 Kostka D, Spang R. Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures. PLoS Computational Biology 2008; 4: e22
- 17 Ancona N, Maglietta R, Piepoli A, D’Addabbo A Cotugno, Savino M, Liuni S, Carella M, Pesole G, Perri F. On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics 2006; 19 (07) 387
- 18 Schmidberger M, Vicedo E, Mansmann U. affyPara: Parallelized preprocessing methods for Affymetrix Oligonucleotide Arrays. Rpackage version 1.13.0, 2011 (online). Available. http://bioconductor.org/packages/2.9/bioc/html/affyPara.html. (accessed: Oct 28, 2011)
- 19 Sanfilippo S, Noordhuis P Redis. Version 2.4.2, 2011 (online). Available. http://redis.io/download. (accessed: Oct 28, 2011)
- 20 Lewis BW. doRedis: Foreach parallel adapter for the redis package. R package version 1.0.4, 2011 (online). Available. http://CRAN.R-project.org/package=doRedis. (accessed: Oct 28, 2011)
- 21 Wikimedia Foundation.Wikipedia. Embarrassingly parallel, 2012. Available. http://en.wikipedia.org/wiki/Embarrassingly parallel. (accessed: Jan 28, 2012)