Methods Inf Med 2013; 52(01): 65-71
DOI: 10.3414/ME11-02-0043
Focus Theme – Original Articles
Schattauer GmbH

Application of Microarray Analysis on Computer Cluster and Cloud Platforms[*]

C. Bernau
1   Department for Medical Informatics, Biometry and Epidemiology (IBE), Ludwig-Maximilians-University Munich, Munich, Germany
,
A.-L. Boulesteix
1   Department for Medical Informatics, Biometry and Epidemiology (IBE), Ludwig-Maximilians-University Munich, Munich, Germany
,
J. Knaus
2   Institute of Medical Biometry and Medical Informatics, Albert-Ludwigs-University Freiburg, Freiburg, Germany
› Author Affiliations
Further Information

Publication History

Received: 07 November 2011

accepted: 05 March 2012

Publication Date:
20 January 2018 (online)

Summary

Background: Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services.

Objectives: In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources.

Methods: In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms.

Results: Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the paralleli -zation is comparable in efficiency to standard computer cluster implementations.

Conclusion: Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

* Supplementary material published on our website www.methods-online.com