1
|
Monnakgotla NR, Mahungu AC, Heckmann JM, Botha G, Mulder NJ, Wu G, Rampersaud E, Myers J, Van Blitterswijk M, Rademakers R, Taylor JP, Wuu J, Benatar M, Nel M. Analysis of Structural Variants Previously Associated With ALS in Europeans Highlights Genomic Architectural Differences in Africans. Neurol Genet 2023; 9:e200077. [PMID: 37346932 PMCID: PMC10281237 DOI: 10.1212/nxg.0000000000200077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/03/2023] [Indexed: 06/23/2023]
Abstract
Background and Objectives Amyotrophic lateral sclerosis (ALS) is a degenerative condition of the brain and spinal cord in which protein-coding variants in known ALS disease genes explain a minority of sporadic cases. There is a growing interest in the role of noncoding structural variants (SVs) as ALS risk variants or genetic modifiers of ALS phenotype. In small European samples, specific short SV alleles in noncoding regulatory regions of SCAF4, SQSTM1, and STMN2 have been reported to be associated with ALS, and several groups have investigated the possible role of SMN1/SMN2 gene copy numbers in ALS susceptibility and clinical severity. Methods Using short-read whole genome sequencing (WGS) data, we investigated putative ALS-susceptibility SCAF4 (3'UTR poly-T repeat), SQSTM1 (intron 5 AAAC insertion), and STMN2 (intron 3 CA repeat) alleles in African ancestry patients with ALS and described the architecture of the SMN1/SMN2 gene region. South African cases with ALS (n = 114) were compared with ancestry-matched controls (n = 150), 1000 Genomes Project samples (n = 2,336), and H3Africa Genotyping Chip Project samples (n = 347). Results There was no association with previously reported SCAF4 poly-T repeat, SQSTM1 AAAC insertion, and long STMN2 CA alleles with ALS risk in South Africans (p > 0.2). Similarly, SMN1 and SMN2 gene copy numbers did not differ between South Africans with ALS and matched population controls (p > 0.9). Notably, 20% of the African samples in this study had no SMN2 gene copies, which is a higher frequency than that reported in Europeans (approximately 7%). Discussion We did not replicate the reported association of SCAF4, SQSTM1, and STMN2 short SVs with ALS in a small South African sample. In addition, we found no link between SMN1 and SMN2 copy numbers and susceptibility to ALS in this South African sample, which is similar to the conclusion of a recent meta-analysis of European studies. However, the SMN gene region findings in Africans replicate previous results from East and West Africa and highlight the importance of including diverse population groups in disease gene discovery efforts. The clinically relevant differences in the SMN gene architecture between African and non-African populations may affect the effectiveness of targeted SMN2 gene therapy for related diseases such as spinal muscular atrophy.
Collapse
Affiliation(s)
- Nomakhosazana R Monnakgotla
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Amokelani C Mahungu
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Jeannine M Heckmann
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Gerrit Botha
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Nicola J Mulder
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Gang Wu
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Evadnie Rampersaud
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Jason Myers
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Marka Van Blitterswijk
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Rosa Rademakers
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - J Paul Taylor
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Joanne Wuu
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Michael Benatar
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| | - Melissa Nel
- From the Neurology Research Group (N.R.M., A.C.M., J.M.H., M.N.), Division of Neurology, Department of Medicine; Neuroscience Institute (N.R.M., A.C.M., J.M.H., M.N.); Computational Biology Division (G.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.V.B.), Mayo Clinic, Jacksonville, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belgium; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Department of Neurology (J.W., M.B.), University of Miami, FL
| |
Collapse
|
2
|
Potgieter MG, Nel AJM, Fortuin S, Garnett S, Wendoh JM, Tabb DL, Mulder NJ, Blackburn JM. MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets. PLoS Comput Biol 2023; 19:e1011163. [PMID: 37327214 DOI: 10.1371/journal.pcbi.1011163] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 05/08/2023] [Indexed: 06/18/2023] Open
Abstract
BACKGROUND Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.
Collapse
Affiliation(s)
- Matthys G Potgieter
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Andrew J M Nel
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Suereta Fortuin
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Shaun Garnett
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Jerome M Wendoh
- Division of Immunology, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - David L Tabb
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences; African Microbiome Institute; South African Tuberculosis Bioinformatics Initiative; Stellenbosch University, Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jonathan M Blackburn
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
3
|
Sinkala M, Elsheikh SSM, Mbiyavanga M, Cullinan J, Mulder NJ. A genome-wide association study identifies distinct variants associated with pulmonary function among European and African ancestries from the UK Biobank. Commun Biol 2023; 6:49. [PMID: 36641522 PMCID: PMC9840173 DOI: 10.1038/s42003-023-04443-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 01/09/2023] [Indexed: 01/16/2023] Open
Abstract
Pulmonary function is an indicator of well-being, and pulmonary pathologies are the third major cause of death worldwide. We analysed the UK Biobank genome-wide association summary statistics of pulmonary function for Europeans and individuals of recent African descent to identify variants associated with the trait in the two ancestries. Here, we show 627 variants in Europeans and 3 in Africans associated with three pulmonary function parameters. In addition to the 110 variants in Europeans previously reported to be associated with phenotypes related to pulmonary function, we identify 279 novel loci, including an ISX intergenic variant rs369476290 on chromosome 22 in Africans. Remarkably, we find no shared variants among Africans and Europeans. Furthermore, enrichment analyses of variants separately for each ancestry background reveal significant enrichment for terms related to pulmonary phenotypes in Europeans but not Africans. Further analysis of studies of pulmonary phenotypes reveals that individuals of European background are disproportionally overrepresented in datasets compared to Africans, with the gap widening over the past five years. Our findings extend our understanding of the different variants that modify the pulmonary function in Africans and Europeans, a promising finding for future GWASs and medical studies.
Collapse
Affiliation(s)
- Musalula Sinkala
- Computational Biology Division, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Rd, Observatory, 7925, Cape Town, South Africa.
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Mamana Mbiyavanga
- Computational Biology Division, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Rd, Observatory, 7925, Cape Town, South Africa
| | - Joshua Cullinan
- Computational Biology Division, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Rd, Observatory, 7925, Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Rd, Observatory, 7925, Cape Town, South Africa
| |
Collapse
|
4
|
Nel M, Mahungu AC, Monnakgotla N, Botha GR, Mulder NJ, Wu G, Rampersaud E, van Blitterswijk M, Wuu J, Cooley A, Myers J, Rademakers R, Taylor JP, Benatar M, Heckmann JM. Revealing the Mutational Spectrum in Southern Africans With Amyotrophic Lateral Sclerosis. Neurol Genet 2022; 8:e654. [PMID: 35047667 PMCID: PMC8756565 DOI: 10.1212/nxg.0000000000000654] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 12/08/2021] [Indexed: 11/15/2022]
Abstract
Background and Objectives To perform the first screen of 44 amyotrophic lateral sclerosis (ALS) genes in a cohort of African genetic ancestry individuals with ALS using whole-genome sequencing (WGS) data. Methods One hundred three consecutive cases with probable/definite ALS (using the revised El Escorial criteria), and self-categorized as African genetic ancestry, underwent WGS using various Illumina platforms. As population controls, 238 samples from various African WGS data sets were included. Our analysis was restricted to 44 ALS genes, which were curated for rare sequence variants and classified according to the American College of Medical Genetics guidelines as likely benign, uncertain significance, likely pathogenic, or pathogenic variants. Results Thirteen percent of 103 ALS cases harbored pathogenic variants; 5 different SOD1 variants (N87S, G94D, I114T, L145S, and L145F) in 5 individuals (5%, 1 familial case), pathogenic C9orf72 repeat expansions in 7 individuals (7%, 1 familial case) and a likely pathogenic ANXA11 (G38R) variant in 1 individual. Thirty individuals (29%) harbored ≥1 variant of uncertain significance; 10 of these variants had limited pathogenic evidence, although this was insufficient to permit confident classification as pathogenic. Discussion Our findings show that known ALS genes can be expected to identify a genetic cause of disease in >11% of sporadic ALS cases of African genetic ancestry. Similar to European cohorts, the 2 most frequent genes harboring pathogenic variants in this population group are C9orf72 and SOD1.
Collapse
Affiliation(s)
- Melissa Nel
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Amokelani C Mahungu
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Nomakhosazana Monnakgotla
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Gerrit R Botha
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Nicola J Mulder
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Gang Wu
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Evadnie Rampersaud
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Marka van Blitterswijk
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Joanne Wuu
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Anne Cooley
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Jason Myers
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Rosa Rademakers
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - J Paul Taylor
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Michael Benatar
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| | - Jeannine M Heckmann
- Neurology Research Group (M.N., A.C.M., N.M., J.M.H.), Neuroscience Institute, University of Cape Town; Computational Biology Division (M.N., A.C.M., N.M., G.R.B., N.J.M.), Institute of Infectious Disease and Molecular Medicine, Cape Town, South Africa; Center for Applied Bioinformatics (G.W., E.R., J.M.), St. Jude Children's Research Hospital, Memphis, TN; Department of Neuroscience (M.v.B.), Mayo Clinic, Jacksonville, FL; Department of Neurology (J.W., A.C., M.B.), University of Miami, FL; Center for Molecular Neurology (R.R.), University of Antwerp, Belguim; Department of Cell and Molecular Biology (J.P.T.), St. Jude Children's Research Hospital, Memphis, TN; and Neurology (J.M.H.), Department of Medicine, University of Cape Town, South Africa
| |
Collapse
|
5
|
Elsheikh SSM, Chimusa ER, Mulder NJ, Crimi A. Relating Global and Local Connectome Changes to Dementia and Targeted Gene Expression in Alzheimer's Disease. Front Hum Neurosci 2022; 15:761424. [PMID: 35002653 PMCID: PMC8734427 DOI: 10.3389/fnhum.2021.761424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 11/25/2021] [Indexed: 01/01/2023] Open
Abstract
Networks are present in many aspects of our lives, and networks in neuroscience have recently gained much attention leading to novel representations of brain connectivity. The integration of neuroimaging characteristics and genetics data allows a better understanding of the effects of the gene expression on brain structural and functional connections. The current work uses whole-brain tractography in a longitudinal setting, and by measuring the brain structural connectivity changes studies the neurodegeneration of Alzheimer's disease. This is accomplished by examining the effect of targeted genetic risk factors on the most common local and global brain connectivity measures. Furthermore, we examined the extent to which Clinical Dementia Rating relates to brain connections longitudinally, as well as to gene expression. For instance, here we show that the expression of PLAU gene increases the change over time in betweenness centrality related to the fusiform gyrus. We also show that the betweenness centrality metric impact dementia-related changes in distinct brain regions. Our findings provide insights into the complex longitudinal interplay between genetics and brain characteristics and highlight the role of Alzheimer's genetic risk factors in the estimation of regional brain connectivity alterations.
Collapse
Affiliation(s)
- Samar S M Elsheikh
- Pharmacogenetic Research Clinic, Centre for Addiction and Mental Health, Campbell Family Mental Health Research Institute, Toronto, ON, Canada.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Alessandro Crimi
- Computer Vision Group, Sano Centre for Computational Medicine, Kraków, Poland.,Institute for Neuropathology, University Hospital of Zurich, Zurich, Switzerland.,Department of Mathematics, African Institute for Mathematical Sciences, Cape Coast, Ghana
| |
Collapse
|
6
|
Alosaimi S, van Biljon N, Awany D, Thami PK, Defo J, Mugo JW, Bope CD, Mazandu GK, Mulder NJ, Chimusa ER. Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches. Brief Bioinform 2020; 22:6042242. [PMID: 33341897 DOI: 10.1093/bib/bbaa366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 11/14/2020] [Accepted: 01/08/2020] [Indexed: 12/15/2022] Open
Abstract
Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.
Collapse
Affiliation(s)
- Shatha Alosaimi
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Noëlle van Biljon
- Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa
| | - Denis Awany
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Prisca K Thami
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Joel Defo
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Jacquiline W Mugo
- Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Christian D Bope
- Faculty of Sciences, Department of Mathematics and Computer Science, University of Kinshasa, Kinshasa, DRC
| | - Gaston K Mazandu
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.,Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Nicola J Mulder
- Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa.,Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.,Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa
| |
Collapse
|
7
|
Alosaimi S, Bandiang A, van Biljon N, Awany D, Thami PK, Tchamga MSS, Kiran A, Messaoud O, Hassan RIM, Mugo J, Ahmed A, Bope CD, Allali I, Mazandu GK, Mulder NJ, Chimusa ER. A broad survey of DNA sequence data simulation tools. Brief Funct Genomics 2020; 19:49-59. [PMID: 31867604 DOI: 10.1093/bfgp/elz033] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 10/27/2019] [Accepted: 11/04/2019] [Indexed: 11/12/2022] Open
Abstract
In silico DNA sequence generation is a powerful technology to evaluate and validate bioinformatics tools, and accordingly more than 35 DNA sequence simulation tools have been developed. With such a diverse array of tools to choose from, an important question is: Which tool should be used for a desired outcome? This question is largely unanswered as documentation for many of these DNA simulation tools is sparse. To address this, we performed a review of DNA sequence simulation tools developed to date and evaluated 20 state-of-art DNA sequence simulation tools on their ability to produce accurate reads based on their implemented sequence error model. We provide a succinct description of each tool and suggest which tool is most appropriate for the given different scenarios. Given the multitude of similar yet non-identical tools, researchers can use this review as a guide to inform their choice of DNA sequence simulation tool. This paves the way towards assessing existing tools in a unified framework, as well as enabling different simulation scenario analysis within the same framework.
Collapse
Affiliation(s)
- Shatha Alosaimi
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Armand Bandiang
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Noelle van Biljon
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Denis Awany
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Prisca K Thami
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
| | - Milaine S S Tchamga
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Anmol Kiran
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi.,Edinburgh University, Edinburgh, UK
| | - Olfa Messaoud
- Université de Tunis El Manar, Institut Pasteur de Tunis, LR16IPT05 Génomique Biomédicale et Oncogénétique, Tunis, 1002, Tunisia
| | - Radia Ismaeel Mohammed Hassan
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jacquiline Mugo
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Azza Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Sudan
| | - Christian D Bope
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
8
|
Mazandu GK, Hooper C, Opap K, Makinde F, Nembaware V, Thomford NE, Chimusa ER, Wonkam A, Mulder NJ. IHP-PING-generating integrated human protein-protein interaction networks on-the-fly. Brief Bioinform 2020; 22:5943797. [PMID: 33129201 DOI: 10.1093/bib/bbaa277] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/12/2020] [Accepted: 09/21/2020] [Indexed: 01/04/2023] Open
Abstract
Advances in high-throughput sequencing technologies have resulted in an exponential growth of publicly accessible biological datasets. In the 'big data' driven 'post-genomic' context, much work is being done to explore human protein-protein interactions (PPIs) for a systems level based analysis to uncover useful signals and gain more insights to advance current knowledge and answer specific biological and health questions. These PPIs are experimentally or computationally predicted, stored in different online databases and some of PPI resources are updated regularly. As with many biological datasets, such regular updates continuously render older PPI datasets potentially outdated. Moreover, while many of these interactions are shared between these online resources, each resource includes its own identified PPIs and none of these databases exhaustively contains all existing human PPI maps. In this context, it is essential to enable the integration of or combining interaction datasets from different resources, to generate a PPI map with increased coverage and confidence. To allow researchers to produce an integrated human PPI datasets in real-time, we introduce the integrated human protein-protein interaction network generator (IHP-PING) tool. IHP-PING is a flexible python package which generates a human PPI network from freely available online resources. This tool extracts and integrates heterogeneous PPI datasets to generate a unified PPI network, which is stored locally for further applications.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa.,African Institute for Mathematical Sciences, 5-7 Melrose Road, Muizenberg, 7945, Cape Town, South Africa.,Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Christopher Hooper
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa
| | - Kenneth Opap
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa
| | - Funmilayo Makinde
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa.,African Institute for Mathematical Sciences, 5-7 Melrose Road, Muizenberg, 7945, Cape Town, South Africa
| | - Victoria Nembaware
- Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Nicholas E Thomford
- Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa.,School of Medical Sciences, University of Cape Coast, PMB, Cape Coast, Ghana
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa
| |
Collapse
|
9
|
Geza E, Mugo J, Mulder NJ, Wonkam A, Chimusa ER, Mazandu GK. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief Bioinform 2020; 20:1709-1724. [PMID: 30010715 DOI: 10.1093/bib/bby044] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 04/16/2018] [Indexed: 11/14/2022] Open
Abstract
Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current advances, challenges and opportunities behind existing ancestry deconvolution methods.
Collapse
Affiliation(s)
- Ephifania Geza
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa
| | - Jacquiline Mugo
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| | - Gaston K Mazandu
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa.,Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, IDM, University of Cape Town, Cape Town 7925, South Africa.,Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town 7925, South Africa
| |
Collapse
|
10
|
Elsheikh SSM, Chimusa ER, Mulder NJ, Crimi A. Genome-Wide Association Study of Brain Connectivity Changes for Alzheimer's Disease. Sci Rep 2020; 10:1433. [PMID: 31996736 PMCID: PMC6989662 DOI: 10.1038/s41598-020-58291-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 12/30/2019] [Indexed: 01/09/2023] Open
Abstract
Variations in the human genome have been found to be an essential factor that affects susceptibility to Alzheimer's disease. Genome-wide association studies (GWAS) have identified genetic loci that significantly contribute to the risk of Alzheimers. The availability of genetic data, coupled with brain imaging technologies have opened the door for further discoveries, by using data integration methodologies and new study designs. Although methods have been proposed for integrating image characteristics and genetic information for studying Alzheimers, the measurement of disease is often taken at a single time point, therefore, not allowing the disease progression to be taken into consideration. In longitudinal settings, we analyzed neuroimaging and single nucleotide polymorphism datasets obtained from the Alzheimer's Disease Neuroimaging Initiative for three clinical stages of the disease, including healthy control, early mild cognitive impairment and Alzheimer's disease subjects. We conducted a GWAS regressing the absolute change of global connectivity metrics on the genetic variants, and used the GWAS summary statistics to compute the gene and pathway scores. We observed significant associations between the change in structural brain connectivity defined by tractography and genes, which have previously been reported to biologically manipulate the risk and progression of certain neurodegenerative disorders, including Alzheimer's disease.
Collapse
Affiliation(s)
- Samar S M Elsheikh
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa.
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Alessandro Crimi
- University Hospital of Zürich, Zürich, 8091, Switzerland
- African Institute for Mathematical Sciences, Biriwa, Ghana
| |
Collapse
|
11
|
Kumuthini J, Zass L, Panji S, Salifu SP, Kayondo JK, Nembaware V, Mbiyavanga M, Olabode A, Kishk A, Wells G, Mulder NJ. The H3ABioNet helpdesk: an online bioinformatics resource, enhancing Africa's capacity for genomics research. BMC Bioinformatics 2019; 20:741. [PMID: 31888443 PMCID: PMC6937968 DOI: 10.1186/s12859-019-3322-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 12/16/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Currently, formal mechanisms for bioinformatics support are limited. The H3Africa Bioinformatics Network has implemented a public and freely available Helpdesk (HD), which provides generic bioinformatics support to researchers through an online ticketing platform. The following article reports on the H3ABioNet HD (H3A-HD)'s development, outlining its design, management, usage and evaluation framework, as well as the lessons learned through implementation. RESULTS The H3A-HD evaluated using automatically generated usage logs, user feedback and qualitative ticket evaluation. Evaluation revealed that communication methods, ticketing strategies and the technical platforms used are some of the primary factors which may influence the effectivity of HD. CONCLUSION To continuously improve the H3A-HD services, the resource should be regularly monitored and evaluated. The H3A-HD design, implementation and evaluation framework could be easily adapted for use by interested stakeholders within the Bioinformatics community and beyond.
Collapse
Affiliation(s)
- Judit Kumuthini
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa.
| | - Lyndon Zass
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Sumir Panji
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape, South Africa
| | - Samson P Salifu
- Kumasi Centre for Collaborative Research in Tropical Medicine, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | | | - Victoria Nembaware
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape, South Africa
| | - Mamana Mbiyavanga
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape, South Africa
| | - Ajayi Olabode
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Ali Kishk
- Center for Informatics Science, Nile University, 6th October City, Egypt
| | - Gordon Wells
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Nicola J Mulder
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, Western Cape, South Africa
| |
Collapse
|
12
|
Geza E, Mulder NJ, Chimusa ER, Mazandu GK. FRANC: a unified framework for multi-way local ancestry deconvolution with high density SNP data. Brief Bioinform 2019. [DOI: 10.1093/bib/bbz117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Several thousand genomes have been completed with millions of variants identified in the human deoxyribonucleic acid sequences. These genomic variations, especially those introduced by admixture, significantly contribute to a remarkable phenotypic variability with medical and/or evolutionary implications. Elucidating local ancestry estimates is necessary for a better understanding of genomic variation patterns throughout modern human evolution and adaptive processes, and consequences in human heredity and health. However, existing local ancestry deconvolution tools are accessible as individual scripts, each requiring input and producing output in its own complex format. This limits the user’s ability to retrieve local ancestry estimates. We introduce a unified framework for multi-way local ancestry inference, FRANC, integrating eight existing state-of-the-art local ancestry deconvolution tools. FRANC is an adaptable, expandable and portable tool that manipulates tool-specific inputs, deconvolutes ancestry and standardizes tool-specific results. To facilitate both medical and population genetics studies, FRANC requires convenient and easy to manipulate input files and allows users to choose output formats to ease their use in further potential local ancestry deconvolution applications.
Collapse
Affiliation(s)
- Ephifania Geza
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Emile R Chimusa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| | - Gaston K Mazandu
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine University of Cape Town Health Sciences Campus Anzio Rd, Observatory, 7925, South Africa
| |
Collapse
|
13
|
Mazandu GK, Chimusa ER, Rutherford K, Zekeng EG, Gebremariam ZZ, Onifade MY, Mulder NJ. Large-scale data-driven integrative framework for extracting essential targets and processes from disease-associated gene data sets. Brief Bioinform 2019; 19:1141-1152. [PMID: 28520909 DOI: 10.1093/bib/bbx052] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Indexed: 12/20/2022] Open
Abstract
Populations worldwide currently face several public health challenges, including growing prevalence of infections and the emergence of new pathogenic organisms. The cost and risk associated with drug development make the development of new drugs for several diseases, especially orphan or rare diseases, unappealing to the pharmaceutical industry. Proof of drug safety and efficacy is required before market approval, and rigorous testing makes the drug development process slow, expensive and frequently result in failure. This failure is often because of the use of irrelevant targets identified in the early steps of the drug discovery process, suggesting that target identification and validation are cornerstones for the success of drug discovery and development. Here, we present a large-scale data-driven integrative computational framework to extract essential targets and processes from an existing disease-associated data set and enhance target selection by leveraging drug-target-disease association at the systems level. We applied this framework to tuberculosis and Ebola virus diseases combining heterogeneous data from multiple sources, including protein-protein functional interaction, functional annotation and pharmaceutical data sets. Results obtained demonstrate the effectiveness of the pipeline, leading to the extraction of essential drug targets and to the rational use of existing approved drugs. This provides an opportunity to move toward optimal target-based strategies for screening available drugs and for drug discovery. There is potential for this model to bridge the gap in the production of orphan disease therapies, offering a systematic approach to predict new uses for existing drugs, thereby harnessing their full therapeutic potential.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Institute of Infectious Disease and Molecular Medicine at UCT and a Researcher at AIMS
| | | | | | | | - Zoe Z Gebremariam
- Institute of Infection and Global Health, University of Liverpool, UK
| | - Maryam Y Onifade
- African Institute for Mathematical Sciences jointly with University of Cape Coast, Ghana
| | - Nicola J Mulder
- Department of Integrative Biomedical Sciences and the Head of the Computational Biology Division, UCT
| |
Collapse
|
14
|
Abstract
Drug repositioning is the process of finding new therapeutic uses for existing, approved drugs-a process thathas value when considering the exorbitant costs of novel drug development. Several computational strategies exist as a way to predict these alternative applications. In this study, we used datasets on: (1) human biological drug targets and (2) disease-associated genes and, based on a direct functional interaction between them, searched for potential opportunities for drug repositioning. From the set of 1125 unique drug targets and their 88 490 interactions with disease-associated genes, 30 drug targets were analyzed and (3) discussed in detail for the purpose of this article. The current indications of the drugs thattarget them were validated through the interactions, and new opportunities for repositioning were predicted. Among the set of drugs for potential repositioning werebenzodiazepines for the treatment of autism spectrum disorders; nortriptyline for the treatment of melanoma, glioma and other cancers; and vitamin B6 in prevention of spontaneous abortions and cleft palate birth defects. Special emphasis was also placed on those new potential indications that pertained to orphan diseases-these are diseases whose rarity means that development of novel treatment is not financially viable. This computational drug repositioning approach uses existing information on drugs and drug targets, and insights into the genetic basis of disease, as a means to systematically generate the most probable new uses for the drugs on offer, and in this way harness their true therapeutic power.
Collapse
|
15
|
Elsheikh SSM, Bakas S, Mulder NJ, Chimusa ER, Davatzikos C, Crimi A. Multi-stage Association Analysis of Glioblastoma Gene Expressions with Texture and Spatial Patterns. Brainlesion 2019; 11383:239-250. [PMID: 31482151 PMCID: PMC6719702 DOI: 10.1007/978-3-030-11723-8_24] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Glioblastoma is the most aggressive malignant primary brain tumor with a poor prognosis. Glioblastoma heterogeneous neuroimaging, pathologic, and molecular features provide opportunities for subclassification, prognostication, and the development of targeted therapies. Magnetic resonance imaging has the capability of quantifying specific phenotypic imaging features of these tumors. Additional insight into disease mechanism can be gained by exploring genetics foundations. Here, we use the gene expressions to evaluate the associations with various quantitative imaging phenomic features extracted from magnetic resonance imaging. We highlight a novel correlation by carrying out multi-stage genomewide association tests at the gene-level through a non-parametric correlation framework that allows testing multiple hypotheses about the integrated relationship of imaging phenotype-genotype more efficiently and less expensive computationally. Our result showed several novel genes previously associated with glioblastoma and other types of cancers, as the LRRC46 (chromosome 17), EPGN (chromosome 4) and TUBA1C (chromosome 12), all associated with our radiographic tumor features.
Collapse
Affiliation(s)
- Samar S M Elsheikh
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Spyridon Bakas
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Christos Davatzikos
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alessandro Crimi
- University Hospital of Zürich, Zürich, Switzerland
- African Institute for Mathematical Sciences, Biriwa, Ghana
| |
Collapse
|
16
|
Mugo JW, Geza E, Defo J, Elsheikh SSM, Mazandu GK, Mulder NJ, Chimusa ER. A multi-scenario genome-wide medical population genetics simulation framework. Bioinformatics 2018; 33:2995-3002. [PMID: 28957497 DOI: 10.1093/bioinformatics/btx369] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 06/21/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. Results Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. Availability and implementation The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM. Contact emile.chimusa@uct.ac.za. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jacquiline W Mugo
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Ephifania Geza
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa.,African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Joel Defo
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Samar S M Elsheikh
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Gaston K Mazandu
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa.,African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Nicola J Mulder
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Department of Pathology, Division of Human Genetics, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Observatory, Cape Town 7925, South Africa
| |
Collapse
|
17
|
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform 2017; 18:886-901. [PMID: 27473066 DOI: 10.1093/bib/bbw067] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Indexed: 01/02/2023] Open
Abstract
Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.
Collapse
|
18
|
Mulder NJ, Adebiyi E, Adebiyi M, Adeyemi S, Ahmed A, Ahmed R, Akanle B, Alibi M, Armstrong DL, Aron S, Ashano E, Baichoo S, Benkahla A, Brown DK, Chimusa ER, Fadlelmola FM, Falola D, Fatumo S, Ghedira K, Ghouila A, Hazelhurst S, Isewon I, Jung S, Kassim SK, Kayondo JK, Mbiyavanga M, Meintjes A, Mohammed S, Mosaku A, Moussa A, Muhammd M, Mungloo-Dilmohamud Z, Nashiru O, Odia T, Okafor A, Oladipo O, Osamor V, Oyelade J, Sadki K, Salifu SP, Soyemi J, Panji S, Radouani F, Souiai O, Tastan Bishop Ö. Development of Bioinformatics Infrastructure for Genomics Research. Glob Heart 2017; 12:91-98. [PMID: 28302555 DOI: 10.1016/j.gheart.2017.01.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 01/05/2017] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. OBJECTIVES H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. METHODS AND RESULTS Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. CONCLUSIONS For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa.
| | - Ezekiel Adebiyi
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Marion Adebiyi
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Seun Adeyemi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria; Center for System and Information Service, Covenant University, Ota, Nigeria
| | - Azza Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Rehab Ahmed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Bola Akanle
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria; Center for System and Information Service, Covenant University, Ota, Nigeria
| | - Mohamed Alibi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (BIMS), Institut Pasteur de Tunis, Tunis, Tunisia
| | - Don L Armstrong
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Shaun Aron
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Efejiro Ashano
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria; H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria
| | | | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (BIMS), Institut Pasteur de Tunis, Tunis, Tunisia
| | - David K Brown
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | - Emile R Chimusa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa; Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Faisal M Fadlelmola
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan; Future University of Sudan, Khartoum, Sudan
| | - Dare Falola
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Segun Fatumo
- H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (BIMS), Institut Pasteur de Tunis, Tunis, Tunisia
| | - Amel Ghouila
- Institut Pasteur de Tunis, LR11IPT02, Laboratory of Transmission, Control and Immunobiology of Infections (LTCII), Tunis-Belvédère, Tunisia
| | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Segun Jung
- Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA
| | - Samar Kamal Kassim
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Abbaseya, Cairo, Egypt
| | | | - Mamana Mbiyavanga
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ayton Meintjes
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Somia Mohammed
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | - Abayomi Mosaku
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Ahmed Moussa
- LAbTIC Laboratory, ENSA, Abdelmalek Essaadi University, Tangier, Morocco
| | - Mustafa Muhammd
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan
| | | | - Oyekanmi Nashiru
- H3Africa Bioinformatics Network (H3ABioNet) Node, National Biotechnology Development Agency (NABDA), Federal Ministry of Science and Technology (FMST), Abuja, Nigeria
| | - Trust Odia
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Adaobi Okafor
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Olaleye Oladipo
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria; Center for System and Information Service, Landmark University, Omu-Aran, Nigeria
| | - Victor Osamor
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Jellili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Nigeria
| | - Khalid Sadki
- School of Sciences, Mohammed V University of Rabat, Rabat, Morocco
| | - Samson Pandam Salifu
- Department of Biochemistry and Biotechnology, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana; Kumasi Centre for Collaborative Research, South End Asougya Road, KNUST Campus, Kumasi, Ghana
| | - Jumoke Soyemi
- Department of Computer Science, Ilaro Polytechnic, Ilaro, Nigeria
| | - Sumir Panji
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Fouzia Radouani
- Chlamydiae and Mycoplasma Laboratory, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Oussama Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics (BIMS), Institut Pasteur de Tunis, Tunis, Tunisia
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | | |
Collapse
|
19
|
Akinola RO, Mazandu GK, Mulder NJ. A Quantitative Approach to Analyzing Genome Reductive Evolution Using Protein-Protein Interaction Networks: A Case Study of Mycobacterium leprae. Front Genet 2016; 7:39. [PMID: 27066064 PMCID: PMC4809885 DOI: 10.3389/fgene.2016.00039] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 03/08/2016] [Indexed: 01/18/2023] Open
Abstract
The advance in high-throughput sequencing technologies has yielded complete genome sequences of several organisms, including complete bacterial genomes. The growing number of these available sequenced genomes has enabled analyses of their dynamics, as well as the molecular and evolutionary processes which these organisms are under. Comparative genomics of different bacterial genomes have highlighted their genome size and gene content in association with lifestyles and adaptation to various environments and have contributed to enhancing our understanding of the mechanisms of their evolution. Protein–protein functional interactions mediate many essential processes for maintaining the stability of the biological systems under changing environmental conditions. Thus, these interactions play crucial roles in the evolutionary processes of different organisms, especially for obligate intracellular bacteria, proven to generally have reduced genome sizes compared to their nearest free-living relatives. In this study, we used the approach based on the Renormalization Group (RG) analysis technique and the Maximum-Excluded-Mass-Burning (MEMB) model to investigate the evolutionary process of genome reduction in relation to the organization of functional networks of two organisms. Using a Mycobacterium leprae (MLP) network in comparison with a Mycobacterium tuberculosis (MTB) network as a case study, we show that reductive evolution in MLP was as a result of removal of important proteins from neighbors of corresponding orthologous MTB proteins. While each orthologous MTB protein had an increase in number of interacting partners in most instances, the corresponding MLP protein had lost some of them. This work provides a quantitative model for mapping reductive evolution and protein–protein functional interaction network organization in terms of roles played by different proteins in the network structure.
Collapse
Affiliation(s)
- Richard O Akinola
- Computational Biology Group, Department of Integrative Biomedical Sciences, Medical School, Institute of Infectious Disease and Molecular Medicine, University of Cape TownCape Town, South Africa; Department of Mathematics, Faculty of Natural Sciences, University of JosJos, Nigeria
| | - Gaston K Mazandu
- Computational Biology Group, Department of Integrative Biomedical Sciences, Medical School, Institute of Infectious Disease and Molecular Medicine, University of Cape TownCape Town, South Africa; African Institute for Mathematical SciencesCape Town, South Africa; African Institute for Mathematical SciencesCape Coast, Ghana
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Medical School, Institute of Infectious Disease and Molecular Medicine, University of Cape Town Cape Town, South Africa
| |
Collapse
|
20
|
Mulder NJ, Christoffels A, de Oliveira T, Gamieldien J, Hazelhurst S, Joubert F, Kumuthini J, Pillay CS, Snoep JL, Tastan Bishop Ö, Tiffin N. The Development of Computational Biology in South Africa: Successes Achieved and Lessons Learnt. PLoS Comput Biol 2016; 12:e1004395. [PMID: 26845152 PMCID: PMC4742231 DOI: 10.1371/journal.pcbi.1004395] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Bioinformatics is now a critical skill in many research and commercial environments as biological data are increasing in both size and complexity. South African researchers recognized this need in the mid-1990s and responded by working with the government as well as international bodies to develop initiatives to build bioinformatics capacity in the country. Significant injections of support from these bodies provided a springboard for the establishment of computational biology units at multiple universities throughout the country, which took on teaching, basic research and support roles. Several challenges were encountered, for example with unreliability of funding, lack of skills, and lack of infrastructure. However, the bioinformatics community worked together to overcome these, and South Africa is now arguably the leading country in bioinformatics on the African continent. Here we discuss how the discipline developed in the country, highlighting the challenges, successes, and lessons learnt.
Collapse
Affiliation(s)
- Nicola J. Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- * E-mail:
| | - Alan Christoffels
- South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Tulio de Oliveira
- Africa Centre for Health and Population Studies, School of Laboratory Medicine and Medical Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Junaid Gamieldien
- South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Scott Hazelhurst
- School of Electrical & Information Engineering, and Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Fourie Joubert
- Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| | - Judit Kumuthini
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
| | - Ché S. Pillay
- School of Life Sciences, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| | - Jacky L. Snoep
- Department of Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | - Nicki Tiffin
- South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| |
Collapse
|
21
|
Mulder NJ, Adebiyi E, Alami R, Benkahla A, Brandful J, Doumbia S, Everett D, Fadlelmola FM, Gaboun F, Gaseitsiwe S, Ghazal H, Hazelhurst S, Hide W, Ibrahimi A, Jaufeerally Fakim Y, Jongeneel CV, Joubert F, Kassim S, Kayondo J, Kumuthini J, Lyantagaye S, Makani J, Mansour Alzohairy A, Masiga D, Moussa A, Nash O, Ouwe Missi Oukem-Boyer O, Owusu-Dabo E, Panji S, Patterton H, Radouani F, Sadki K, Seghrouchni F, Tastan Bishop Ö, Tiffin N, Ulenga N. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa. Genome Res 2015; 26:271-7. [PMID: 26627985 PMCID: PMC4728379 DOI: 10.1101/gr.196295.115] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 11/25/2015] [Indexed: 11/24/2022]
Abstract
The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa 7925
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe) and Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria, P.M.B. 1023
| | - Raouf Alami
- Centre National de Transfusion Sanguine, Rabat, Morocco 10100
| | | | - James Brandful
- Noguchi Memorial Institute for Medical Research, University of Ghana, Ghana, LG
| | - Seydou Doumbia
- University of Sciences, Techniques and Technology of Bamako, Bamako, Mali BPE 3206
| | - Dean Everett
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi, 3/Institute of Infection and Global Health, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Faisal M Fadlelmola
- Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum/Future University of Sudan, Khartoum, Sudan 11115
| | - Fatima Gaboun
- Institut National de Recherche Agronomique, Rabat, Morocco 10000
| | | | | | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa 2193
| | - Winston Hide
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA/Sheffield Institute for Translational Neuroscience, Department of Neuroscience, University of Sheffield, Sheffield S10 2HQ, United Kingdom
| | - Azeddine Ibrahimi
- Faculté de Médecine et de Pharmacie de Rabat, Université Mohammed V Souissi, Rabat, Morocco 10100
| | | | - C Victor Jongeneel
- National Center for Supercomputing Applications and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Fourie Joubert
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa 0083
| | - Samar Kassim
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt 11566
| | | | - Judit Kumuthini
- Centre for Proteomic and Genomic Research, Cape Town, South Africa 7925
| | | | - Julie Makani
- Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania 00255
| | | | - Daniel Masiga
- International Centre of Insect Physiology and Ecology, Nairobi, Kenya 00100
| | - Ahmed Moussa
- Abdelmalek Essaadi University, ENSA, Tangier, Morocco 90000
| | - Oyekanmi Nash
- National Biotechnology Development Agency, Abuja, Nigeria 10099
| | | | - Ellis Owusu-Dabo
- Kumasi Centre for Collaborative Research in Tropical Medicine/Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, PMB
| | - Sumir Panji
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa 7925
| | - Hugh Patterton
- University of the Free State, Bloemfontein, South Africa 9300
| | | | - Khalid Sadki
- Faculty of Sciences of Rabat, University Mohammed V of Rabat, Rabat, Morocco 10000
| | | | - Özlem Tastan Bishop
- Research Unit in Bioinformatics, Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa 6140
| | - Nicki Tiffin
- South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Cape Town, South Africa 7530
| | - Nzovu Ulenga
- Management and Development for Health, Dar es Salaam, Tanzania, 61
| | | |
Collapse
|
22
|
Chimusa ER, Mbiyavanga M, Mazandu GK, Mulder NJ. ancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations. Bioinformatics 2015; 32:549-56. [PMID: 26508762 DOI: 10.1093/bioinformatics/btv619] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 10/16/2015] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Despite numerous successful Genome-wide Association Studies (GWAS), detecting variants that have low disease risk still poses a challenge. GWAS may miss disease genes with weak genetic effects or strong epistatic effects due to the single-marker testing approach commonly used. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of single nucleotide polymorphisms within a gene to increase the likelihood of fully characterizing the susceptibility gene. RESULTS We developed ancGWAS, an algebraic graph-based centrality measure that accounts for linkage disequilibrium in identifying significant disease sub-networks by integrating the association signal from GWAS data sets into the human protein-protein interaction (PPI) network. We validated ancGWAS using an association study result from a breast cancer data set and the simulation of interactive disease loci in the simulation of a complex admixed population, as well as pathway-based GWAS simulation. This new approach holds promise for deconvoluting the interactions between genes underlying the pathogenesis of complex diseases. Results obtained yield a novel central breast cancer sub-network of the human interactome implicated in the proteoglycan syndecan-mediated signaling events pathway which is known to play a major role in mesenchymal tumor cell proliferation, thus providing further insights into breast cancer pathogenesis. AVAILABILITY AND IMPLEMENTATION The ancGWAS package and documents are available at http://www.cbio.uct.ac.za/~emile/software.html.
Collapse
Affiliation(s)
- Emile R Chimusa
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and
| | - Mamana Mbiyavanga
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and African Institute for Mathematical Sciences, 7945 Muizenberg, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and African Institute for Mathematical Sciences, 7945 Muizenberg, Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and
| |
Collapse
|
23
|
Mazandu GK, Chimusa ER, Mbiyavanga M, Mulder NJ. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool. Bioinformatics 2015; 32:477-9. [PMID: 26476781 DOI: 10.1093/bioinformatics/btv590] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 10/08/2015] [Indexed: 01/01/2023] Open
Abstract
SUMMARY Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. AVAILABILITY AND IMPLEMENTATION A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). CONTACT gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa and Cape Coast, Ghana
| | - Emile R Chimusa
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| | - Mamana Mbiyavanga
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| |
Collapse
|
24
|
Budd A, Corpas M, Brazas MD, Fuller JC, Goecks J, Mulder NJ, Michaut M, Ouellette BFF, Pawlik A, Blomberg N. A quick guide for building a successful bioinformatics community. PLoS Comput Biol 2015; 11:e1003972. [PMID: 25654371 PMCID: PMC4318577 DOI: 10.1371/journal.pcbi.1003972] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
"Scientific community" refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop "The 'How To Guide' for Establishing a Successful Bioinformatics Network" at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB).
Collapse
Affiliation(s)
- Aidan Budd
- Structural and Computational Biology (SCB) Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Manuel Corpas
- The Genome Analysis Centre (TGAC), Norwich Research Park, Norwich, United Kingdom
| | - Michelle D. Brazas
- Ontario Institute for Cancer Research, MaRS Centre, West Tower, Toronto, Ontario, Canada
| | - Jonathan C. Fuller
- Heidelberg Institute for Theoretical Studies (HITS) gGmbH, Heidelberg, Germany
| | - Jeremy Goecks
- The Computational Biology Institute, George Washington University, Innovation Hall, Virginia, United States of America
| | - Nicola J. Mulder
- Computational Biology Group, Institute of Infectious Disease and Molecular Medicine (IDM), University of Cape Town Faculty of Health Sciences, Cape Town, South Africa
| | - Magali Michaut
- Computational Cancer Biology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - B. F. Francis Ouellette
- Ontario Institute for Cancer Research, MaRS Centre, West Tower, Toronto, Ontario, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Aleksandra Pawlik
- The Software Sustainability Institute, School of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Niklas Blomberg
- ELIXIR Hub, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
25
|
Abstract
Infectious diseases are the leading cause of death, particularly in developing countries. Although many drugs are available for treating the most common infectious diseases, in many cases the mechanism of action of these drugs or even their targets in the pathogen remain unknown. In addition, the key factors or processes in pathogens that facilitate infection and disease progression are often not well understood. Since proteins do not work in isolation, understanding biological systems requires a better understanding of the interconnectivity between proteins in different pathways and processes, which includes both physical and other functional interactions. Such biological networks can be generated within organisms or between organisms sharing a common environment using experimental data and computational predictions. Though different data sources provide different levels of accuracy, confidence in interactions can be measured using interaction scores. Connections between interacting proteins in biological networks can be represented as graphs and edges, and thus studied using existing algorithms and tools from graph theory. There are many different applications of biological networks, and here we discuss three such applications, specifically applied to the infectious disease tuberculosis, with its causative agent Mycobacterium tuberculosis and host, Homo sapiens. The applications include the use of the networks for function prediction, comparison of networks for evolutionary studies, and the generation and use of host–pathogen interaction networks.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Richard O Akinola
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Holifidy Rapanoel
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| |
Collapse
|
26
|
Mazandu GK, Mulder NJ. The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines. Front Genet 2014; 5:264. [PMID: 25147557 PMCID: PMC4123725 DOI: 10.3389/fgene.2014.00264] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 07/18/2014] [Indexed: 11/14/2022] Open
Abstract
With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town Cape Town, South Africa
| |
Collapse
|
27
|
Tastan Bishop Ö, Adebiyi EF, Alzohairy AM, Everett D, Ghedira K, Ghouila A, Kumuthini J, Mulder NJ, Panji S, Patterton HG. Bioinformatics education--perspectives and challenges out of Africa. Brief Bioinform 2014; 16:355-64. [PMID: 24990350 PMCID: PMC4364068 DOI: 10.1093/bib/bbu022] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations.
Collapse
|
28
|
Salazar GA, Meintjes A, Mazandu GK, Rapanoël HA, Akinola RO, Mulder NJ. A web-based protein interaction network visualizer. BMC Bioinformatics 2014; 15:129. [PMID: 24885165 PMCID: PMC4029974 DOI: 10.1186/1471-2105-15-129] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 04/24/2014] [Indexed: 01/18/2023] Open
Abstract
Background Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism’s processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. Results Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions (http://biosual.cbio.uct.ac.za/pinv.html). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. Conclusions The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today’s research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual.
Collapse
Affiliation(s)
- Gustavo A Salazar
- Computational Biology Group, IDM, Faculty of Health Sciences, University of Cape Town, Anzio Road, Cape Town, South Africa.
| | | | | | | | | | | |
Collapse
|
29
|
Mazandu GK, Mulder NJ. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures. BMC Bioinformatics 2013; 14:284. [PMID: 24067102 PMCID: PMC3849277 DOI: 10.1186/1471-2105-14-284] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/17/2013] [Indexed: 11/30/2022] Open
Abstract
Background The use of Gene Ontology (GO) data in protein analyses have largely contributed to
the improved outcomes of these analyses. Several GO semantic similarity measures
have been proposed in recent years and provide tools that allow the integration of
biological knowledge embedded in the GO structure into different biological
analyses. There is a need for a unified tool that provides the scientific
community with the opportunity to explore these different GO similarity measure
approaches and their biological applications. Results We have developed DaGO-Fun, an online tool available at
http://web.cbio.uct.ac.za/ITGOM, which incorporates many different
GO similarity measures for exploring, analyzing and comparing GO terms and
proteins within the context of GO. It uses GO data and UniProt proteins with their
GO annotations as provided by the Gene Ontology Annotation (GOA) project to
precompute GO term information content (IC), enabling rapid response to user
queries. Conclusions The DaGO-Fun online tool presents the advantage of integrating all the relevant
IC-based GO similarity measures, including topology- and annotation-based
approaches to facilitate effective exploration of these measures, thus enabling
users to choose the most relevant approach for their application. Furthermore,
this tool includes several biological applications related to GO semantic
similarity scores, including the retrieval of genes based on their GO annotations,
the clustering of functionally related genes within a set, and term enrichment
analysis.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town, 7925, South Africa.
| | | |
Collapse
|
30
|
Chimusa ER, Zaitlen N, Daya M, Möller M, van Helden PD, Mulder NJ, Price AL, Hoal EG. Genome-wide association study of ancestry-specific TB risk in the South African Coloured population. Hum Mol Genet 2013; 23:796-809. [PMID: 24057671 DOI: 10.1093/hmg/ddt462] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The worldwide burden of tuberculosis (TB) remains an enormous problem, and is particularly severe in the admixed South African Coloured (SAC) population residing in the Western Cape. Despite evidence from twin studies suggesting a strong genetic component to TB resistance, only a few loci have been identified to date. In this work, we conduct a genome-wide association study (GWAS), meta-analysis and trans-ethnic fine mapping to attempt the replication of previously identified TB susceptibility loci. Our GWAS results confirm the WT1 chr11 susceptibility locus (rs2057178: odds ratio = 0.62, P = 2.71e(-06)) previously identified by Thye et al., but fail to replicate previously identified polymorphisms in the TLR8 gene and locus 18q11.2. Our study demonstrates that the genetic contribution to TB risk varies between continental populations, and illustrates the value of including admixed populations in studies of TB risk and other complex phenotypes. Our evaluation of local ancestry based on the real and simulated data demonstrates that case-only admixture mapping is currently impractical in multi-way admixed populations, such as the SAC, due to spurious deviations in average local ancestry generated by current local ancestry inference methods. This study provides insights into identifying disease genes and ancestry-specific disease risk in multi-way admixed populations.
Collapse
Affiliation(s)
- Emile R Chimusa
- Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Chimusa ER, Daya M, Möller M, Ramesar R, Henn BM, van Helden PD, Mulder NJ, Hoal EG. Determining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method. PLoS One 2013; 8:e73971. [PMID: 24066090 PMCID: PMC3774743 DOI: 10.1371/journal.pone.0073971] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 07/25/2013] [Indexed: 02/03/2023] Open
Abstract
Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes. We applied this approach to a complex, uniquely admixed South African population. Using genome-wide SNP data from over 764 individuals, we accurately estimate the genetic contributions from the best ancestral populations: isiXhosa [Formula: see text], ‡Khomani SAN [Formula: see text], European [Formula: see text], Indian [Formula: see text], and Chinese [Formula: see text]. We also demonstrate that the ancestral allele frequency differences correlate with increased linkage disequilibrium in the South African population, which originates from admixture events rather than population bottlenecks. NOMENCLATURE The collective term for people of mixed ancestry in southern Africa is "Coloured," and this is officially recognized in South Africa as a census term, and for self-classification. Whilst we acknowledge that some cultures may use this term in a derogatory manner, these connotations are not present in South Africa, and are certainly not intended here.
Collapse
Affiliation(s)
- Emile R. Chimusa
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Cape Town, South Africa
| | - Michelle Daya
- MRC Centre for Molecular and Cellular Biology, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Marlo Möller
- MRC Centre for Molecular and Cellular Biology, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Raj Ramesar
- MRC Human Genetics Research Unit, Division of Human Genetics, Department of Clinical Laboratory Sciences, Institute for Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Brenna M. Henn
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Paul D. van Helden
- MRC Centre for Molecular and Cellular Biology, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Nicola J. Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Cape Town, South Africa
| | - Eileen G. Hoal
- MRC Centre for Molecular and Cellular Biology, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
| |
Collapse
|
32
|
Deffur A, Mulder NJ, Wilkinson RJ. Co-infection with Mycobacterium tuberculosis and human immunodeficiency virus: an overview and motivation for systems approaches. Pathog Dis 2013; 69:101-13. [PMID: 23821533 DOI: 10.1111/2049-632x.12060] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 06/17/2013] [Accepted: 06/20/2013] [Indexed: 12/13/2022] Open
Abstract
Tuberculosis is a devastating disease that accounts for a high proportion of infectious disease morbidity and mortality worldwide. HIV-1 co-infection exacerbates tuberculosis. Enhanced understanding of the host-pathogen relationship in HIV-1 and Mycobacterium tuberculosis co-infection is required. While reductionist approaches have yielded many valuable insights into disease pathogenesis, systems approaches are required that develop data-driven models able to predict emergent properties of this complex co-infection system in order to develop novel therapeutic approaches and to improve diagnostics. Here, we provide a pathogenesis-focused overview of HIV-TB co-infection followed by an introduction to systems approaches and concrete examples of how such approaches are useful.
Collapse
Affiliation(s)
- Armin Deffur
- Clinical Infectious Diseases Research Initiative, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa; Department of Medicine, University of Cape Town, Cape Town, South Africa
| | | | | |
Collapse
|
33
|
Rapanoel HA, Mazandu GK, Mulder NJ. Predicting and analyzing interactions between Mycobacterium tuberculosis and its human host. PLoS One 2013; 8:e67472. [PMID: 23844013 PMCID: PMC3699628 DOI: 10.1371/journal.pone.0067472] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 05/17/2013] [Indexed: 12/20/2022] Open
Abstract
The outcome of infection by Mycobacterium tuberculosis (Mtb) depends greatly on how the host responds to the bacteria and how the bacteria manipulates the host, which is facilitated by protein-protein interactions. Thus, to understand this process, there is a need for elucidating protein interactions between human and Mtb, which may enable us to characterize specific molecular mechanisms allowing the bacteria to persist and survive under different environmental conditions. In this work, we used the interologs method based on experimentally verified intra-species and inter-species interactions to predict human-Mtb functional interactions. These interactions were further filtered using known human-Mtb interactions and genes that are differentially expressed during infection, producing 190 interactions. Further analysis of the subcellular location of proteins involved in these human-Mtb interactions confirms feasibility of these interactions. We also conducted functional analysis of human and Mtb proteins involved in these interactions, checking whether these proteins play a role in infection and/or disease, and enriching Mtb proteins in a previously predicted list of drug targets. We found that the biological processes of the human interacting proteins suggested their involvement in apoptosis and production of nitric oxide, whereas those of the Mtb interacting proteins were relevant to the intracellular environment of Mtb in the host. Mapping these proteins onto KEGG pathways highlighted proteins belonging to the tuberculosis pathway and also suggested that Mtb proteins might use the host to acquire nutrients, which is in agreement with the intracellular lifestyle of Mtb. This indicates that these interactions can shed light on the interplay between Mtb and its human host and thus, contribute to the process of designing novel drugs with new biological mechanisms of action.
Collapse
Affiliation(s)
- Holifidy A. Rapanoel
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Nicola J. Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
- * E-mail:
| |
Collapse
|
34
|
Mazandu GK, Mulder NJ. Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci 2012; 13:7283-7302. [PMID: 22837694 PMCID: PMC3397526 DOI: 10.3390/ijms13067283] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 05/28/2012] [Accepted: 06/07/2012] [Indexed: 11/16/2022] Open
Abstract
High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled "unknown", "uncharacterized" or "hypothetical", limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these "hypothetical" proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis "hypothetical" proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.
Collapse
Affiliation(s)
| | - Nicola J. Mulder
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +27-21-406-6058; Fax: +27-21-406-6068
| |
Collapse
|
35
|
Mazandu GK, Mulder NJ. Using the underlying biological organization of the Mycobacterium tuberculosis functional network for protein function prediction. Infect Genet Evol 2011; 12:922-32. [PMID: 22085822 DOI: 10.1016/j.meegid.2011.10.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Revised: 10/25/2011] [Accepted: 10/28/2011] [Indexed: 10/15/2022]
Abstract
Despite ever-increasing amounts of sequence and functional genomics data, there is still a deficiency of functional annotation for many newly sequenced proteins. For Mycobacterium tuberculosis (MTB), more than half of its genome is still uncharacterized, which hampers the search for new drug targets within the bacterial pathogen and limits our understanding of its pathogenicity. As for many other genomes, the annotations of proteins in the MTB proteome were generally inferred from sequence homology, which is effective but its applicability has limitations. We have carried out large-scale biological data integration to produce an MTB protein functional interaction network. Protein functional relationships were extracted from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and additional functional interactions from microarray, sequence and protein signature data. The confidence level of protein relationships in the additional functional interaction data was evaluated using a dynamic data-driven scoring system. This functional network has been used to predict functions of uncharacterized proteins using Gene Ontology (GO) terms, and the semantic similarity between these terms measured using a state-of-the-art GO similarity metric. To achieve better trade-off between improvement of quality, genomic coverage and scalability, this prediction is done by observing the key principles driving the biological organization of the functional network. This study yields a new functionally characterized MTB strain CDC1551 proteome, consisting of 3804 and 3698 proteins out of 4195 with annotations in terms of the biological process and molecular function ontologies, respectively. These data can contribute to research into the Development of effective anti-tubercular drugs with novel biological mechanisms of action.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925 Observatory, Cape Town, South Africa
| | | |
Collapse
|
36
|
Abstract
UNLABELLED The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | |
Collapse
|
37
|
Mazandu GK, Opap K, Mulder NJ. Contribution of microarray data to the advancement of knowledge on the Mycobacterium tuberculosis interactome: use of the random partial least squares approach. Infect Genet Evol 2011; 11:725-33. [PMID: 21514402 DOI: 10.1016/j.meegid.2011.04.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Following the central dogma of molecular biology, where data flows from gene to protein through transcript, information on gene expression provides information on the functional state of an organism. Microarray technology arose to measure the expression level of thousands of genes simultaneously. These vast amounts of data generated at all levels of biological organization help to identify co-expressed genes, which may reveal proteins interacting in a complex or acting in the same pathway without direct physical contact. Discovering associations of regulatory patterns of characterized proteins with those of hypothetical proteins may identify functional relationships between them and facilitate the characterization of proteins of unknown function. Here we make use of the random partial least squares regression technique (r-PLS) to trace connections between co-expressed genes in Mycobacterium tuberculosis using data downloaded from public microarray databases. We generated the overall topology of a microbial co-expression network with the exact complexity of the model. This approach provides a general method for generating a co-expression network of an organism for the purpose of systems-level analyses.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925 Observatory, Cape Town, South Africa
| | | | | |
Collapse
|
38
|
Mazandu GK, Opap K, Mulder NJ. Contribution of microarray data to the advancement of knowledge on the Mycobacterium tuberculosis interactome: use of the random partial least squares approach. Infect Genet Evol 2010; 11:181-9. [PMID: 20850566 DOI: 10.1016/j.meegid.2010.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2010] [Revised: 09/03/2010] [Accepted: 09/03/2010] [Indexed: 11/27/2022]
Abstract
Following the central dogma of molecular biology, where data flows from gene to protein through transcript, information on gene expression provides information on the functional state of an organism. Microarray technology arose to measure the expression level of thousands of genes simultaneously. These vast amounts of data generated at all levels of biological organization help to identify co-expressed genes, which may reveal proteins interacting in a complex or acting in the same pathway without direct physical contact. Discovering associations of regulatory patterns of characterized proteins with those of hypothetical proteins may identify functional relationships between them and facilitate the characterization of proteins of unknown function. Here we make use of the random partial least squares regression technique (r-PLS) to trace connections between co-expressed genes in Mycobacterium tuberculosis using data downloaded from public microarray databases. We generated the overall topology of a microbial co-expression network with the exact complexity of the model. This approach provides a general method for generating a co-expression network of an organism for the purpose of systems-level analyses.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, Cape Town, South Africa
| | | | | |
Collapse
|
39
|
Abstract
InterPro provides a one-stop shop for protein-sequence classification, freeing the user from having to visit multiple databases separately and rationalize the different results in varying formats. This unit describes how to submit a sequence to InterProScan via a Web server. It also provides instructions for installing and running InterProScan locally. In addition, details on browsing InterPro families and domains of interest using the InterPro Web and sequence retrieval system (SRS) are provided to show users how to get the most from the resource.
Collapse
Affiliation(s)
- Nicola J Mulder
- The EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | |
Collapse
|
40
|
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJA, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. New developments in the InterPro database. Nucleic Acids Res 2007; 35:D224-8. [PMID: 17202162 PMCID: PMC1899100 DOI: 10.1093/nar/gkl841] [Citation(s) in RCA: 349] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 10/06/2006] [Accepted: 10/06/2006] [Indexed: 11/14/2022] Open
Abstract
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.
Collapse
Affiliation(s)
- Nicola J Mulder
- EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJA, Silventoinen V, Studholme DJ, Vaughan R, Wu CH. InterPro, progress and status in 2005. Nucleic Acids Res 2005; 33:D201-5. [PMID: 15608177 PMCID: PMC540060 DOI: 10.1093/nar/gki106] [Citation(s) in RCA: 419] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Collapse
Affiliation(s)
- Nicola J Mulder
- EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Affiliation(s)
- Nicola J. Mulder
- The EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge
| | - Rolf Apweiler
- The EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge
| |
Collapse
|
43
|
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJA, Vaughan R, Zdobnov EM. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 2003; 31:315-8. [PMID: 12520011 PMCID: PMC165493 DOI: 10.1093/nar/gkg046] [Citation(s) in RCA: 565] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Collapse
Affiliation(s)
- Nicola J. Mulder
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Rolf Apweiler
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Teresa K. Attwood
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Amos Bairoch
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Daniel Barrell
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Alex Bateman
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - David Binns
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Margaret Biswas
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Paul Bradley
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Peer Bork
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Phillip Bucher
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Richard R. Copley
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Emmanuel Courcelle
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Ujjwal Das
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Richard Durbin
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Laurent Falquet
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Wolfgang Fleischmann
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Sam Griffiths-Jones
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Daniel Haft
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Nicola Harte
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Nicolas Hulo
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Daniel Kahn
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Alexander Kanapin
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Maria Krestyaninova
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Rodrigo Lopez
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Ivica Letunic
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - David Lonsdale
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Ville Silventoinen
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Sandra E. Orchard
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Marco Pagni
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - David Peyruc
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Chris P. Ponting
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Jeremy D. Selengut
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Florence Servant
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Christian J. A. Sigrist
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Robert Vaughan
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| | - Evgueni M. Zdobnov
- EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK Swiss Institute for Bioinformatics, Geneva, Switzerland ViaLactia Biosciences, Newmarket Auckland, New Zealand Biocomputing Unit EMBL, Heidelberg, Germany Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland Wellcome Trust Centre for Human Genetics, Oxford, UK CNRS/INRA, Toulouse, France The Institute for Genomic Research, MD, USA MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK EMBL, Heidelberg, Germany
| |
Collapse
|
44
|
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley R, Courcelle E, Durbin R, Falquet L, Fleischmann W, Gouzy J, Griffith-Jones S, Haft D, Hermjakob H, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Orchard S, Pagni M, Peyruc D, Ponting CP, Servant F, Sigrist CJA. InterPro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform 2002; 3:225-35. [PMID: 12230031 DOI: 10.1093/bib/3.3.225] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects. The member databases - PRINTS, PROSITE, Pfam, ProDom, SMART and TIGRFAMs - form the InterPro core. Related signatures from each member database are unified into single InterPro entries. Each InterPro entry includes a unique accession number, functional descriptions and literature references, and links are made back to the relevant member database(s). Release 4.0 of InterPro (November 2001) contains 4,691 entries, representing 3,532 families, 1,068 domains, 74 repeats and 15 sites of post-translational modification (PTMs) encoded by different regular expressions, profiles, fingerprints and hidden Markov models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (2,141,621 InterPro hits from 586,124 SWISS-PROT and TrEMBL protein sequences). The database is freely accessible for text- and sequence-based searches.
Collapse
Affiliation(s)
- Nicola J Mulder
- EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
With the large influx of raw sequence data from genome sequencing projects, there is a need for reliable automatic methods for protein sequence analysis and classification. The most useful tools use various methods for identifying motifs or domains found in previously characterized protein families. This article reviews the tools and resources available on the web for identifying signatures within proteins and discusses how they may be used in the analysis of new or unknown protein sequences.
Collapse
Affiliation(s)
- Nicola J Mulder
- The EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
46
|
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 2001; 29:37-40. [PMID: 11125043 PMCID: PMC29841 DOI: 10.1093/nar/29.1.37] [Citation(s) in RCA: 704] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Collapse
Affiliation(s)
- R Apweiler
- EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM. InterPro--an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 2000; 16:1145-50. [PMID: 11159333 DOI: 10.1093/bioinformatics/16.12.1145] [Citation(s) in RCA: 228] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION InterPro is a new integrated documentation resource for protein families, domains and functional sites, developed initially as a means of rationalising the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. RESULTS Merged annotations from PRINTS, PROSITE and Pfam form the InterPro core. Each combined InterPro entry includes functional descriptions and literature references, and links are made back to the relevant parent database(s), allowing users to see at a glance whether a particular family or domain has associated patterns, profiles, fingerprints, etc. Merged and individual entries (i.e. those that have no counterpart in the companion resources) are assigned unique accession numbers. Release 1.2 of InterPro (June 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification (PTMs) encoded by 6581 different regular expressions, profiles, fingerprints and Hidden Markov Models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1000000 hits from 264333 different proteins out of 384572 in SWISS-PROT and TrEMBL).
Collapse
Affiliation(s)
- R Apweiler
- EMBL Outstation--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Mulder NJ, Zappe H, Steyn LM. Characterization of a Mycobacterium tuberculosis homologue of the Streptomyces coelicolor whiB gene. Tuber Lung Dis 2000; 79:299-308. [PMID: 10707258 DOI: 10.1054/tuld.1999.0217] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
SETTING Molecular Research Laboratory, Department of Medical Microbiology, University of Cape Town and Groote Schuur Hospital. OBJECTIVE Characterize Mycobacterium tuberculosis homologue of the Streptomyces coelicolor, sporulation specific, whiB regulatory gene. DESIGN The M. tuberculosis whiB3 gene was isolated by enriched cloning of a 2.8 kb BamHl fragment to which the S. coelicolor whiB gene hybridized. Expression of the gene was analysed by S1 nuclease analysis and promoter studies. RESULTS An open reading frame within the 2.8 kb BamHl fragment was identified as the M. tuberculosis whiB3 gene, one of four whiB homologues in the M. tuberculosis genome. The deduced amino acid sequence has a 92% identity with a M. leprae protein, and 32% identity with the S. coelicolor WhiB protein. S1 nuclease analysis showed that the M. tuberculosis whiB3 gene is constitutively expressed by the cells in liquid culture. Primer extension analysis revealed three transcriptional start sites. Expression from the three potential promoters is growth phase-dependent. CONCLUSION The M. tuberculosis whiB3 gene is expressed throughout growth, but expression from the individual promoters is growth phase dependent.
Collapse
Affiliation(s)
- N J Mulder
- Department of Medical Microbiology, Medical School, University of Cape Town, Observatory, South Africa
| | | | | |
Collapse
|
49
|
Mulder NJ, Powles RE, Zappe H, Steyn LM. The Mycobacterium tuberculosis mysB gene product is a functional equivalent of the Escherichia coli sigma factor, KatF. Gene 1999; 240:361-70. [PMID: 10580156 DOI: 10.1016/s0378-1119(99)00430-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Mycobacterium tuberculosis, the causative agent of tuberculosis, may remain dormant within its host for many years. The nature of this dormant or latent state is not known, but it may be a specialized form of the stationary growth phase. In Escherichia coli, KatF (or RpoS) is the major stationary phase sigma factor regulating an array of genes expressed in this phase of growth. A potential M. tuberculosis katF homologue was cloned using a fragment of the E. coli katF gene as a probe. DNA sequence analysis of a resultant clone showed 100% identity to a fragment of DNA encoding the M. tuberculosis mysA and mysB genes. Overexpression of mysB in M. bovis BCG resulted in an increase in katG mRNA and catalase and peroxidase activity, and an increase in sensitivity of the cells to isoniazid. An increase in katG promoter activity from a reporter vector was demonstrated when mysB was overexpressed from the same plasmid, indicating a direct relationship between MysB and katG expression.
Collapse
Affiliation(s)
- N J Mulder
- Department of Medical Microbiology, University of Cape Town and Groote Schuur Hospital, Observatory, South Africa
| | | | | | | |
Collapse
|