Semin Thromb Hemost 2019; 45(07): 674-684
DOI: 10.1055/s-0039-1692978
Review Article
Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

A Bioinformatics Toolkit: In Silico Tools and Online Resources for Investigating Genetic Variation

Simon J. Webster
1   Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
,
Maryam A. Aldossary
1   Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
,
Daniel J. Hampshire
1   Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
2   Department of Biomedical Sciences, University of Hull, Hull, United Kingdom
› Author Affiliations
Further Information

Publication History

Publication Date:
05 August 2019 (online)

Abstract

With the advent of large-scale next-generation sequencing initiatives, there is an increasing importance to interpret and understand the potential phenotypic influence of identified genetic variation and its significance in the human genome. Bioinformatics analyses can provide useful information to assist with variant interpretation. This review provides an overview of tools/resources currently available, and how they can help predict the impact of genetic variation at the deoxyribonucleic acid, ribonucleic acid, and protein level.

 
  • References

  • 1 Hampshire DJ, Abuzenadah AM, Cartwright A. , et al. Identification and characterisation of mutations associated with von Willebrand disease in a Turkish patient cohort. Thromb Haemost 2013; 110 (02) 264-274
  • 2 Smith NL, Chen M-H, Dehghan A. , et al; Wellcome Trust Case Control Consortium. Novel associations of multiple genetic loci with plasma levels of factor VII, factor VIII, and von Willebrand factor: the CHARGE (Cohorts for Heart and Aging Research in Genome Epidemiology) Consortium. Circulation 2010; 121 (12) 1382-1392
  • 3 de Vries PS, Chasman DI, Sabater-Lleal M. , et al. A meta-analysis of 120 246 individuals identifies 18 new loci for fibrinogen concentration. Hum Mol Genet 2016; 25 (02) 358-370
  • 4 Nelson CP, Goel A, Butterworth AS. , et al; EPIC-CVD Consortium; CARDIoGRAMplusC4D; UK Biobank CardioMetabolic Consortium CHD working group. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 2017; 49 (09) 1385-1391
  • 5 Verweij N, Eppinga RN, Hagemeijer Y, van der Harst P. Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure. Sci Rep 2017; 7 (01) 2761
  • 6 Leo VC, Morgan NV, Bem D. , et al; UK GAPP Study Group. Use of next-generation sequencing and candidate gene analysis to identify underlying defects in patients with inherited platelet function disorders. J Thromb Haemost 2015; 13 (04) 643-650
  • 7 Borràs N, Batlle J, Pérez-Rodríguez A. , et al. Molecular and clinical profile of von Willebrand disease in Spain (PCM-EVW-ES): comprehensive genetic analysis by next-generation sequencing of 480 patients. Haematologica 2017; 102 (12) 2005-2014
  • 8 Johnsen JM, Fletcher SN, Huston H. , et al. Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative. Blood Adv 2017; 1 (13) 824-834
  • 9 Auton A, Brooks LD, Durbin RM. , et al; 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015; 526 (7571): 68-74
  • 10 Lek M, Karczewski KJ, Minikel EV. , et al; Exome Aggregation Consortium. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016; 536 (7616): 285-291
  • 11 Richards S, Aziz N, Bale S. , et al; ACMG Laboratory Quality Assurance Committee. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015; 17 (05) 405-424
  • 12 Landrum MJ, Lee JM, Benson M. , et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 2018; 46 (D1): D1062-D1067
  • 13 Hampshire DJ, Goodeve AC. The international society on thrombosis and haematosis von Willebrand disease database: an update. Semin Thromb Hemost 2011; 37 (05) 470-479
  • 14 Rallapalli PM, Kemball-Cook G, Tuddenham EG, Gomez K, Perkins SJ. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J Thromb Haemost 2013; 11 (07) 1329-1340
  • 15 Hampshire DJ, Cairo A, Dolan G. , et al. EAHAD-DB: a combined coagulation factor variant databases resource for the clinical and scientific communities. J Thromb Haemost 2015; 13 (Suppl. 02) abst. PO676-WED
  • 16 Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PEM. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 2008; 29 (01) 6-13
  • 17 den Dunnen JT, Dalgleish R, Maglott DR. , et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat 2016; 37 (06) 564-569
  • 18 Siepel A, Bejerano G, Pedersen JS. , et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 2005; 15 (08) 1034-1050
  • 19 Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010; 20 (01) 110-121
  • 20 Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. ; NISC Comparative Sequencing Program. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005; 15 (07) 901-913
  • 21 Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput Biol 2010; 6 (12) e1001025
  • 22 Funnell APW, Crossley M, Hemophilia B. Hemophilia B Leyden and once mysterious cis-regulatory mutations. Trends Genet 2014; 30 (01) 18-23
  • 23 Othman M, Chirinian Y, Brown C. , et al. Functional characterization of a 13-bp deletion (c.-1522_-1510del13) in the promoter of the von Willebrand factor gene in type 1 von Willebrand disease. Blood 2010; 116 (18) 3645-3652
  • 24 Zerbino DR, Achuthan P, Akanni W. , et al. Ensembl 2018. Nucleic Acids Res 2018; 46 (D1): D754-D761
  • 25 Rosenbloom KR, Sloan CA, Malladi VS. , et al. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 2013; 41 (Database issue): D56-D63
  • 26 Griffith OL, Montgomery SB, Bernier B. , et al; Open Regulatory Annotation Consortium. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res 2008; 36 (Database issue): D107-D113
  • 27 Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol 1987; 196 (02) 261-282
  • 28 Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 2013; 9 (08) e1003709
  • 29 Corrales I, Ramírez L, Altisent C, Parra R, Vidal F. The study of the effect of splicing mutations in von Willebrand factor using RNA isolated from patients' platelets and leukocytes. J Thromb Haemost 2011; 9 (04) 679-688
  • 30 Martorell L, Corrales I, Ramirez L. , et al. Molecular characterization of ten F8 splicing mutations in RNA isolated from patient's leucocytes: assessment of in silico prediction tools accuracy. Haemophilia 2015; 21 (02) 249-257
  • 31 Nuzzo F, Bulato C, Nielsen BI. , et al. Characterization of an apparently synonymous F5 mutation causing aberrant splicing and factor V deficiency. Haemophilia 2015; 21 (02) 241-248
  • 32 Hawke L, Bowman ML, Poon M-C, Scully M-F, Rivard G-E, James PD. Characterization of aberrant splicing of von Willebrand factor in von Willebrand disease: an underrecognized mechanism. Blood 2016; 128 (04) 584-593
  • 33 Bach JE, Müller CR, Rost S. Mini-gene assays confirm the splicing effect of deep intronic variants in the factor VIII gene. Thromb Haemost 2016; 115 (01) 222-224
  • 34 Castaman G, Giacomelli SH, Mancuso ME. , et al. Deep intronic variations may cause mild hemophilia A. J Thromb Haemost 2011; 9 (08) 1541-1548
  • 35 Castoldi E, Duckers C, Radu C. , et al. Homozygous F5 deep-intronic splicing mutation resulting in severe factor V deficiency and undetectable thrombin generation in platelet-rich plasma. J Thromb Haemost 2011; 9 (05) 959-968
  • 36 Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 2001; 29 (05) 1185-1190
  • 37 Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort--an interactive splice-site analysis tool. Nucleic Acids Res 2007; 35 (Web Server issue): W285-91
  • 38 Desmet F-O, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009; 37 (09) e67
  • 39 Brunak S, Engelbrecht J, Knudsen S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 1991; 220 (01) 49-65
  • 40 Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput Biol 1997; 4 (03) 311-323
  • 41 Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004; 11 (2-3): 377-394
  • 42 Wang M, Marín A. Characterization and prediction of alternative splice sites. Gene 2006; 366 (02) 219-227
  • 43 Houdayer C, Dehainault C, Mattler C. , et al. Evaluation of in silico splice tools for decision-making in molecular diagnosis. Hum Mutat 2008; 29 (07) 975-982
  • 44 Daidone V, Gallinaro L, Grazia Cattini M. , et al. An apparently silent nucleotide substitution (c.7056C>T) in the von Willebrand factor gene is responsible for type 1 von Willebrand disease. Haematologica 2011; 96 (06) 881-887
  • 45 James PD, O'Brien LA, Hegadorn CA. , et al. A novel type 2A von Willebrand factor mutation located at the last nucleotide of exon 26 (3538G>A) causes skipping of 2 nonadjacent exons. Blood 2004; 104 (09) 2739-2745
  • 46 Lee Y, Rio DC. Mechanisms and regulation of alternative pre-mRNA splicing. Annu Rev Biochem 2015; 84 (01) 291-323
  • 47 Otsuka H, Sasai H, Nakama M. , et al. Exon 10 skipping in ACAT1 caused by a novel c.949G>A mutation located at an exonic splice enhancer site. Mol Med Rep 2016; 14 (05) 4906-4910
  • 48 Palhais B, Dembic M, Sabaratnam R. , et al. The prevalent deep intronic c. 639+919 G>A GLA mutation causes pseudoexon activation and Fabry disease by abolishing the binding of hnRNPA1 and hnRNP A2/B1 to a splicing silencer. Mol Genet Metab 2016; 119 (03) 258-269
  • 49 Balestra D, Barbon E, Scalet D. , et al. Regulation of a strong F9 cryptic 5'ss by intrinsic elements and by combination of tailored U1snRNAs with antisense oligonucleotides. Hum Mol Genet 2015; 24 (17) 4809-4816
  • 50 Mufti AH, Alyami NH, Peake IR, Goodeve AC, Hampshire DJ. Silent von Willebrand factor variant c.4146G>T (p.Leu1382=) causes type 1 von Willebrand disease via disruption of an exonic splice enhancer motif. Res Pract Thromb Haemost 2017; 1 (Suppl. 01) abst. OC 22.25
  • 51 Fort A, Borel C, Migliavacca E, Antonarakis SE, Fish RJ, Neerman-Arbez M. Regulation of fibrinogen production by microRNAs. Blood 2010; 116 (14) 2608-2615
  • 52 Vossen CY, van Hylckama Vlieg A, Teruel-Montoya R. , et al. Identification of coagulation gene 3'UTR variants that are potentially regulated by microRNAs. Br J Haematol 2017; 177 (05) 782-790
  • 53 Mufti AH, Ogiwara K, Swystun LL. , et al; European Group on von Willebrand disease (EU-VWD) and Zimmerman Program for the Molecular and Clinical Biology of von Willebrand disease (ZPMCB-VWD) Study Groups. The common VWF single nucleotide variants c.2365A>G and c.2385T>C modify VWF biosynthesis and clearance. Blood Adv 2018; 2 (13) 1585-1594
  • 54 Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol 2017; 18 (02) 102-114
  • 55 Simhadri VL, Hamasaki-Katagiri N, Lin BC. , et al. Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet 2017; 54 (05) 338-345
  • 56 Ng PC, Henikoff S. Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 2006; 7 (01) 61-80
  • 57 Miller MP, Kumar S. Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet 2001; 10 (21) 2319-2328
  • 58 Sunyaev S, Ramensky V, Bork P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet 2000; 16 (05) 198-200
  • 59 Wang Z, Moult J. SNPs, protein structure, and disease. Hum Mutat 2001; 17 (04) 263-270
  • 60 Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003; 31 (13) 3812-3814
  • 61 Adzhubei IA, Schmidt S, Peshkin L. , et al. A method and server for predicting damaging missense mutations. Nat Methods 2010; 7 (04) 248-249
  • 62 Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013; Chapter 7 (01) 20
  • 63 Thusberg J, Olatubosun A, Vihinen M. Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 2011; 32 (04) 358-368
  • 64 Luxembourg B, D'Souza M, Körber S, Seifried E. Prediction of the pathogenicity of antithrombin sequence variations by in silico methods. Thromb Res 2015; 135 (02) 404-409
  • 65 Nurden AT, Pillois X, Fiore M. , et al. Expanding the mutation spectrum affecting αIIbβ3 integrin in Glanzmann Thrombasthenia: screening of the ITGA2B and ITGB3 genes in a large international cohort. Hum Mutat 2015; 36 (05) 548-561
  • 66 Sengupta M, Sarkar D, Ganguly K, Sengupta D, Bhaskar S, Ray K. In silico analyses of missense mutations in coagulation factor VIII: identification of severity determinants of haemophilia A. Haemophilia 2015; 21 (05) 662-669
  • 67 Stoll M, Rühle F, Witten A. , et al. Rare variants in the ADAMTS13 von Willebrand factor-binding domain contribute to pediatric stroke. Circ Cardiovasc Genet 2016; 9 (04) 357-367
  • 68 Pillois X, Peters P, Segers K, Nurden AT. In silico analysis of structural modifications in and around the integrin αIIb genu caused by ITGA2B variants in human platelets with emphasis on Glanzmann thrombasthenia. Mol Genet Genomic Med 2018; 6 (02) 249-260
  • 69 Pandurangan AP, Ochoa-Montaño B, Ascher DB, Blundell TL. SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res 2017; 45 (W1): W229-W235
  • 70 Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 2006; 62 (04) 1125-1132
  • 71 Artimo P, Jonnalagedda M, Arnold K. , et al. ExPASy: SIB bioinformatics resource portal. Nucleic Acids Res 2012; 40 (Web Server issue): W597-603
  • 72 Nurden P, Bordet J-C, Pillois X, Nurden AT. An intracytoplasmic β3 Leu718 deletion in a patient with a novel platelet phenotype. Blood Adv 2017; 1 (08) 494-499
  • 73 Saultier P, Vidal L, Canault M. , et al. Macrothrombocytopenia and dense granule deficiency associated with FLI1 variants: ultrastructural and pathogenic features. Haematologica 2017; 102 (06) 1006-1016
  • 74 Waterhouse A, Bertoni M, Bienert S. , et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 2018; 46 (W1): W296-W303
  • 75 Almagro Armenteros JJ, Tsirigos KD, S⊘nderby CK. , et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol 2019; 37 (04) 420-423
  • 76 Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014; 46 (03) 310-315
  • 77 McLaren W, Gil L, Hunt SE. , et al. The Ensembl Variant Effect Predictor. Genome Biol 2016; 17 (01) 122
  • 78 Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4 (01) 44-57
  • 79 Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, von Mering C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics 2015; 15 (18) 3163-3168
  • 80 NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2018; 46 (D1): D8-D13
  • 81 Casper J, Zweig AS, Villarreal C. , et al. The UCSC Genome Browser database: 2018 update. Nucleic Acids Res 2018; 46 (D1): D762-D769
  • 82 NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2016; 44 (D1): D7-D19
  • 83 MacDonald JR, Ziman R, Yuen RKC, Feuk L, Scherer SW. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 2014; 42 (Database issue): D986-D992
  • 84 Sievers F, Wilm A, Dineen D. , et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 2011; 7 (01) 539
  • 85 Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 1988; 16 (22) 10881-10890
  • 86 Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004; 32 (05) 1792-1797
  • 87 Kreft L, Soete A, Hulpiau P, Botzki A, Saeys Y, De Bleser P. ConTra v3: a tool to identify transcription factor binding sites across species, update 2017. Nucleic Acids Res 2017; 45 (W1): W490-W494
  • 88 Jegga AG, Chen J, Gowrisankar S. , et al. GenomeTrafac: a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs. Nucleic Acids Res 2007; 35 (Database issue): D116-D121
  • 89 Lee T-Y, Chang W-C, Hsu JB-K, Chang T-H, Shien D-M. GPMiner: an integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. BMC Genomics 2012; 13 (Suppl. 01) S3
  • 90 Stothard P. The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 2000; 28 (06) 1102-1104 , 1104
  • 91 Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR. ESEfinder: a web resource to identify exonic splicing enhancers. Nucleic Acids Res 2003; 31 (13) 3568-3571
  • 92 Fairbrother WG, Yeh R-F, Sharp PA, Burge CB. Predictive identification of exonic splicing enhancers in human genes. Science 2002; 297 (5583): 1007-1013
  • 93 Paz I, Akerman M, Dror I, Kosti I, Mandel-Gutfreund Y. SFmap: a web server for motif analysis and prediction of splicing factor binding sites. Nucleic Acids Res 2010; 38 (Web Server issue): W281-5
  • 94 Nakamura Y, Gojobori T, Ikemura T. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 2000; 28 (01) 292
  • 95 Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res 2008; 36 (Database issue): D154-D158
  • 96 Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 2011; 39 (Database issue): D152-D157
  • 97 Hamada M, Ono Y, Kiryu H. , et al. Rtools: a web server for various secondary structural analyses on single RNA sequences. Nucleic Acids Res 2016; 44 (W1): W302-7
  • 98 Tavtigian SV, Deffenbaugh AM, Yin L. , et al. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 2006; 43 (04) 295-305
  • 99 Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 2014; 11 (04) 361-362
  • 100 Li B, Krishnan VG, Mort ME. , et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 2009; 25 (21) 2744-2750
  • 101 Niroula A, Urolagin S, Vihinen M. PON-P2: prediction method for fast and reliable identification of harmful variants. PLoS One 2015; 10 (02) e0117380
  • 102 Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015; 31 (16) 2745-2747
  • 103 Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 2012; 40 (Web Server issue): W452-7
  • 104 Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 2007; 35 (Database issue): D301-D303