1
|
Dong Y, Duan S, Xia Q, Liang Z, Dong X, Margaryan K, Musayev M, Goryslavets S, Zdunić G, Bert PF, Lacombe T, Maul E, Nick P, Bitskinashvili K, Bisztray GD, Drori E, De Lorenzis G, Cunha J, Popescu CF, Arroyo-Garcia R, Arnold C, Ergül A, Zhu Y, Ma C, Wang S, Liu S, Tang L, Wang C, Li D, Pan Y, Li J, Yang L, Li X, Xiang G, Yang Z, Chen B, Dai Z, Wang Y, Arakelyan A, Kuliyev V, Spotar G, Girollet N, Delrot S, Ollat N, This P, Marchal C, Sarah G, Laucou V, Bacilieri R, Röckel F, Guan P, Jung A, Riemann M, Ujmajuridze L, Zakalashvili T, Maghradze D, Höhn M, Jahnke G, Kiss E, Deák T, Rahimi O, Hübner S, Grassi F, Mercati F, Sunseri F, Eiras-Dias J, Dumitru AM, Carrasco D, Rodriguez-Izquierdo A, Muñoz G, Uysal T, Özer C, Kazan K, Xu M, Wang Y, Zhu S, Lu J, Zhao M, Wang L, Jiu S, Zhang Y, Sun L, Yang H, Weiss E, Wang S, Zhu Y, Li S, Sheng J, Chen W. Dual domestications and origin of traits in grapevine evolution. Science 2023; 379:892-901. [PMID: 36862793 DOI: 10.1126/science.add8655] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
We elucidate grapevine evolution and domestication histories with 3525 cultivated and wild accessions worldwide. In the Pleistocene, harsh climate drove the separation of wild grape ecotypes caused by continuous habitat fragmentation. Then, domestication occurred concurrently about 11,000 years ago in Western Asia and the Caucasus to yield table and wine grapevines. The Western Asia domesticates dispersed into Europe with early farmers, introgressed with ancient wild western ecotypes, and subsequently diversified along human migration trails into muscat and unique western wine grape ancestries by the late Neolithic. Analyses of domestication traits also reveal new insights into selection for berry palatability, hermaphroditism, muscat flavor, and berry skin color. These data demonstrate the role of the grapevines in the early inception of agriculture across Eurasia.
Collapse
Affiliation(s)
- Yang Dong
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Shengchang Duan
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Qiuju Xia
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Zhenchang Liang
- Beijing Key Laboratory of Grape Science and Oenology and Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
| | - Xiao Dong
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Kristine Margaryan
- Institute of Molecular Biology, NAS RA, 0014 Yerevan, Armenia.,Yerevan State University, 0014 Yerevan, Armenia
| | - Mirza Musayev
- Genetic Resources Institute, Azerbaijan National Academy of Sciences, AZ1106 Baku, Azerbaijan
| | | | - Goran Zdunić
- Institute for Adriatic Crops and Karst Reclamation, 21000 Split, Croatia
| | - Pierre-François Bert
- Bordeaux University, Bordeaux Sciences Agro, INRAE, UMR EGFV, ISVV, 33882 Villenave d'Ornon, France
| | - Thierry Lacombe
- AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro Montpellier, 34398 Montpellier, France
| | - Erika Maul
- Julius Kühn Institute (JKI) - Federal Research Center for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, 76833 Siebeldingen, Germany
| | - Peter Nick
- Botanical Institute, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | | | - György Dénes Bisztray
- Hungarian University of Agriculture and Life Sciences (MATE), 1118 Budapest, Hungary
| | - Elyashiv Drori
- Department of Chemical Engineering, Ariel University, 40700 Ariel, Israel.,Eastern Regional R&D Center, 40700 Ariel, Israel
| | - Gabriella De Lorenzis
- Department of Agricultural and Environmental Sciences, University of Milano, 20133 Milano, Italy
| | - Jorge Cunha
- Instituto Nacional de Investigação Agrária e Veterinária, I.P./INIAV-Dois Portos, 2565-191 Torres Vedras, Portugal.,Green-it Unit, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal
| | - Carmen Florentina Popescu
- National Research and Development Institute for Biotechnology in Horticulture, Stefanesti, 117715 Arges, Romania
| | - Rosa Arroyo-Garcia
- Center for Plant Biotechnology and Genomics, UPM-INIA/CSIC, Pozuelo de Alarcon, 28223 Madrid, Spain
| | | | - Ali Ergül
- Biotechnology Institute, Ankara University, 06135 Ankara, Turkey
| | - Yifan Zhu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China
| | - Chao Ma
- Department of Plant Science, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Shufen Wang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Siqi Liu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Liu Tang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Chunping Wang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Dawei Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Yunbing Pan
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Jingxian Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Ling Yang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Xuzhen Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Guisheng Xiang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Zijiang Yang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Baozheng Chen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Zhanwu Dai
- Beijing Key Laboratory of Grape Science and Oenology and Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
| | - Yi Wang
- Beijing Key Laboratory of Grape Science and Oenology and Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
| | - Arsen Arakelyan
- Institute of Molecular Biology, NAS RA, 0014 Yerevan, Armenia.,Armenian Bioinformatics Institute, 0014 Yerevan, Armenia.,Biomedicine and Pharmacy, RAU, 0051 Yerevan, Armenia
| | - Varis Kuliyev
- Institute of Bioresources, Nakhchivan Branch of the Azerbaijan National Academy of Sciences, AZ7000 Nakhchivan, Azerbaijan
| | - Gennady Spotar
- National Institute of Viticulture and Winemaking Magarach, Yalta 298600, Crimea
| | - Nabil Girollet
- Bordeaux University, Bordeaux Sciences Agro, INRAE, UMR EGFV, ISVV, 33882 Villenave d'Ornon, France
| | - Serge Delrot
- Bordeaux University, Bordeaux Sciences Agro, INRAE, UMR EGFV, ISVV, 33882 Villenave d'Ornon, France
| | - Nathalie Ollat
- Bordeaux University, Bordeaux Sciences Agro, INRAE, UMR EGFV, ISVV, 33882 Villenave d'Ornon, France
| | - Patrice This
- AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro Montpellier, 34398 Montpellier, France
| | - Cécile Marchal
- Vassal-Montpellier Grapevine Biological Resources Center, INRAE, 34340 Marseillan-Plage, France
| | - Gautier Sarah
- AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro Montpellier, 34398 Montpellier, France
| | - Valérie Laucou
- AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro Montpellier, 34398 Montpellier, France
| | - Roberto Bacilieri
- AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro Montpellier, 34398 Montpellier, France
| | - Franco Röckel
- Julius Kühn Institute (JKI) - Federal Research Center for Cultivated Plants, Institute for Grapevine Breeding Geilweilerhof, 76833 Siebeldingen, Germany
| | - Pingyin Guan
- Botanical Institute, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | - Andreas Jung
- Historische Rebsorten-Sammlung, Rebschule (K39), 67599 Gundheim, Germany
| | - Michael Riemann
- Botanical Institute, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | - Levan Ujmajuridze
- LEPL Scientific Research Center of Agriculture, 0159 Tbilisi, Georgia
| | | | - David Maghradze
- LEPL Scientific Research Center of Agriculture, 0159 Tbilisi, Georgia
| | - Maria Höhn
- Hungarian University of Agriculture and Life Sciences (MATE), 1118 Budapest, Hungary
| | - Gizella Jahnke
- Hungarian University of Agriculture and Life Sciences (MATE), 1118 Budapest, Hungary
| | - Erzsébet Kiss
- Hungarian University of Agriculture and Life Sciences (MATE), 1118 Budapest, Hungary
| | - Tamás Deák
- Hungarian University of Agriculture and Life Sciences (MATE), 1118 Budapest, Hungary
| | - Oshrit Rahimi
- Department of Chemical Engineering, Ariel University, 40700 Ariel, Israel
| | - Sariel Hübner
- Galilee Research Institute (Migal), Tel-Hai Academic College, 12210 Upper Galilee, Israel
| | - Fabrizio Grassi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, 20126 Milano, Italy.,NBFC, National Biodiversity Future Center, 90133 Palermo, Italy
| | - Francesco Mercati
- Institute of Biosciences and Bioresources, National Research Council, 90129 Palermo, Italy
| | - Francesco Sunseri
- Department AGRARIA, University Mediterranea of Reggio Calabria, Reggio 89122 Calabria, Italy
| | - José Eiras-Dias
- Instituto Nacional de Investigação Agrária e Veterinária, I.P./INIAV-Dois Portos, 2565-191 Torres Vedras, Portugal.,Green-it Unit, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal
| | - Anamaria Mirabela Dumitru
- National Research and Development Institute for Biotechnology in Horticulture, Stefanesti, 117715 Arges, Romania
| | - David Carrasco
- Center for Plant Biotechnology and Genomics, UPM-INIA/CSIC, Pozuelo de Alarcon, 28223 Madrid, Spain
| | | | | | - Tamer Uysal
- Viticulture Research Institute, Ministry of Agriculture and Forestry, 59200 Tekirdağ, Turkey
| | - Cengiz Özer
- Viticulture Research Institute, Ministry of Agriculture and Forestry, 59200 Tekirdağ, Turkey
| | - Kemal Kazan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Meilong Xu
- Institute of Horticulture, Ningxia Academy of Agricultural and Forestry Sciences, Yinchuan 750002, China
| | - Yunyue Wang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China
| | - Shusheng Zhu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China
| | - Jiang Lu
- Center for Viticulture and Oenology, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Maoxiang Zhao
- Department of Plant Science, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Lei Wang
- Department of Plant Science, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Songtao Jiu
- Department of Plant Science, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Ying Zhang
- Zhengzhou Fruit Research Institutes, CAAS, Zhengzhou 450009, China
| | - Lei Sun
- Zhengzhou Fruit Research Institutes, CAAS, Zhengzhou 450009, China
| | | | - Ehud Weiss
- The Martin (Szusz) Department of Land of Israel Studies and Archaeology, Bar-Ilan University, 5290002 Ramat-Gan, Israel
| | - Shiping Wang
- Department of Plant Science, School of Agriculture and Biology, Shanghai JiaoTong University, Shanghai 200240, China
| | - Youyong Zhu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China
| | - Shaohua Li
- Beijing Key Laboratory of Grape Science and Oenology and Key Laboratory of Plant Resources, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
| | - Jun Sheng
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| | - Wei Chen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China.,Yunnan Research Institute for Local Plateau Agriculture and Industry, Kunming 650201, China
| |
Collapse
|
2
|
Mookkandi S, Roshni J, Velayudam J, Sivakumar M, Ahmed SF. Bioinformatics Resources, Tools, and Strategies in Designing Therapeutic Proteins. THERAPEUTIC PROTEINS AGAINST HUMAN DISEASES 2022:91-123. [DOI: 10.1007/978-981-16-7897-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
3
|
De-la-Cruz IM, Hallab A, Olivares-Pinto U, Tapia-López R, Velázquez-Márquez S, Piñero D, Oyama K, Usadel B, Núñez-Farfán J. Genomic signatures of the evolution of defence against its natural enemies in the poisonous and medicinal plant Datura stramonium (Solanaceae). Sci Rep 2021; 11:882. [PMID: 33441607 PMCID: PMC7806989 DOI: 10.1038/s41598-020-79194-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 12/03/2020] [Indexed: 01/22/2023] Open
Abstract
Tropane alkaloids and terpenoids are widely used in the medicine and pharmaceutic industry and evolved as chemical defenses against herbivores and pathogens in the annual herb Datura stramonium (Solanaceae). Here, we present the first draft genomes of two plants from contrasting environments of D. stramonium. Using these de novo assemblies, along with other previously published genomes from 11 Solanaceae species, we carried out comparative genomic analyses to provide insights on the genome evolution of D. stramonium within the Solanaceae family, and to elucidate adaptive genomic signatures to biotic and abiotic stresses in this plant. We also studied, in detail, the evolution of four genes of D. stramonium-Putrescine N-methyltransferase, Tropinone reductase I, Tropinone reductase II and Hyoscyamine-6S-dioxygenase-involved in the tropane alkaloid biosynthesis. Our analyses revealed that the genomes of D. stramonium show signatures of expansion, physicochemical divergence and/or positive selection on proteins related to the production of tropane alkaloids, terpenoids, and glycoalkaloids as well as on R defensive genes and other important proteins related with biotic and abiotic pressures such as defense against natural enemies and drought.
Collapse
Affiliation(s)
- I M De-la-Cruz
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - A Hallab
- IBG-4 Bioinformatics, CEPLAS, Forschungszentrum Jülich, Julich, Germany
| | - U Olivares-Pinto
- Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México (UNAM), Campus Juriquilla, Querétaro, Mexico
| | - R Tapia-López
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - S Velázquez-Márquez
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - D Piñero
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - K Oyama
- Escuela Nacional de Estudios Superiores and Laboratorio Nacional de Análisis y Síntesis Ecológica (LANASE), Universidad Nacional Autónoma de México (UNAM), Campus Morelia, Morelia, Michoacán, Mexico
| | - B Usadel
- IBG-4 Bioinformatics, CEPLAS, Forschungszentrum Jülich, Julich, Germany
- Institute for Biology I, RWTH Aachen University, Aachen, Germany
| | - J Núñez-Farfán
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico.
| |
Collapse
|
4
|
Alvarado-Delgado A, Martínez-Barnetche J, Téllez-Sosa J, Rodríguez MH, Gutiérrez-Millán E, Zumaya-Estrada FA, Saldaña-Navor V, Rodríguez MC, Tello-López Á, Lanz-Mendoza H. Prediction of neuropeptide precursors and differential expression of adipokinetic hormone/corazonin-related peptide, hugin and corazonin in the brain of malaria vector Nyssorhynchus albimanus during a Plasmodium berghei infection. CURRENT RESEARCH IN INSECT SCIENCE 2021; 1:100014. [PMID: 36003598 PMCID: PMC9387463 DOI: 10.1016/j.cris.2021.100014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/18/2021] [Accepted: 03/22/2021] [Indexed: 12/02/2022]
Abstract
We describe precursors that predicted at least sixty neuropeptides in Ny. albimanus. At least 16 precursors are encoded in the Ny. albimanus brain. Myosuppressin neuropeptide precursor was identified in Ny albimanus. acp and hugin transcripts increased in Ny. albimanus brains infected with P. berghei.
Insect neuropeptides, play a central role in the control of many physiological processes. Based on an analysis of Nyssorhynchus albimanus brain transcriptome a neuropeptide precursor database of the mosquito was described. Also, we observed that adipokinetic hormone/corazonin-related peptide (ACP), hugin and corazonin encoding genes were differentially expressed during Plasmodium infection. Transcriptomic data from Ny. albimanus brain identified 29 pre-propeptides deduced from the sequences that allowed the prediction of at least 60 neuropeptides. The predicted peptides include isoforms of allatostatin C, orcokinin, corazonin, adipokinetic hormone (AKH), SIFamide, capa, hugin, pigment-dispersing factor, adipokinetic hormone/corazonin-related peptide (ACP), tachykinin-related peptide, trissin, neuropeptide F, diuretic hormone 31, bursicon, crustacean cardioactive peptide (CCAP), allatotropin, allatostatin A, ecdysis triggering hormone (ETH), diuretic hormone 44 (Dh44), insulin-like peptides (ILPs) and eclosion hormone (EH). The analysis of the genome of An. albimanus and the generated transcriptome, provided evidence for the identification of myosuppressin neuropeptide precursor. A quantitative analysis documented increased expression of precursors encoding ACP peptide, hugin and corazonin in the mosquito brain after Plasmodium berghei infection. This work represents an initial effort to characterize the neuropeptide precursors repertoire of Ny. albimanus and provides information for understanding neuroregulation of the mosquito response during Plasmodium infection.
Collapse
|
5
|
Reckoning the Dearth of Bioinformatics in the Arena of Diabetic Nephropathy (DN)—Need to Improvise. Processes (Basel) 2020. [DOI: 10.3390/pr8070808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Diabetic nephropathy (DN) is a recent rising concern amongst diabetics and diabetologist. Characterized by abnormal renal function and ending in total loss of kidney function, this is becoming a lurking danger for the ever increasing population of diabetics. This review touches upon the intensity of this complication and briefly reviews the role of bioinformatics in the area of diabetes. The advances made in the area of DN using proteomic approaches are presented. Compared to the enumerable inputs observed through the use of bioinformatics resources in the area of proteomics and even diabetes, the existing scenario of skeletal application of bioinformatics advances to DN is highlighted and the reasons behind this discussed. As this review highlights, almost none of the well-established tools that have brought breakthroughs in proteomic research have been applied into DN. Laborious, voluminous, cost expensive and time-consuming methodologies and advances in diagnostics and biomarker discovery promised through beckoning bioinformatics mechanistic approaches to improvise DN research and achieve breakthroughs. This review is expected to sensitize the researchers to fill in this gap, exploiting the available inputs from bioinformatics resources.
Collapse
|
6
|
Liu H, Shi J, Cai Z, Huang Y, Lv M, Du H, Gao Q, Zuo Y, Dong Z, Huang W, Qin R, Liang C, Lai J, Jin W. Evolution and Domestication Footprints Uncovered from the Genomes of Coix. MOLECULAR PLANT 2020; 13:295-308. [PMID: 31778842 DOI: 10.1016/j.molp.2019.11.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 10/17/2019] [Accepted: 11/13/2019] [Indexed: 05/21/2023]
Abstract
Coix lacryma-jobi, a plant species closely related to Zea and Sorghum, is an important food and medicinal crop in Asia. However, no reference genome of this species has been reported, and its exact phylogeny within the Andropogoneae remains unresolved. Here, we generated a high-quality genome assembly of coix comprising ∼1.73 Gb with 44 485 predicted protein-coding genes. We found coix to be a typical diploid plant with an overall 1-to-1 syntenic relationship with the Sorghum genome, despite its drastic genome expansion (∼2.3-fold) due mainly to the activity of transposable elements. Phylogenetic analysis revealed that coix diverged with sorghum ∼10.41 million years ago, which was ∼1.49 million years later than the divergence between sorghum and maize. Resequencing of 27 additional coix accessions revealed that they could be unambiguously separated into wild relatives and cultivars, and suggested that coix experienced a strong genetic bottleneck, resulting in the loss of about half of the genetic diversity during domestication, even though many traits have remained undomesticated. Our data not only provide novel comparative genomic and evolutionary insights into the Andropogoneae lineage, but also an important resource that will greatly benefit molecular breeding of this important crop.
Collapse
Affiliation(s)
- Hongbing Liu
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China
| | - Junpeng Shi
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing 100193, P. R. China
| | - Zexi Cai
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China
| | - Yumin Huang
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China
| | - Menglu Lv
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing 100193, P. R. China
| | - Huilong Du
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, the Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, P. R. China
| | - Qiang Gao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, the Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, P. R. China
| | - Yi Zuo
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China
| | - Zhaobin Dong
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China
| | - Wei Huang
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China
| | - Rui Qin
- Key Laboratory for Protection and Application of Special Plant Germplasm in Wuling Area of Hubei Province, South-Central University for Nationalities, Wuhan 430074, P. R. China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, the Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 Beichen West Road No. 2, Beijing 100101, P. R. China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China; State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing 100193, P. R. China.
| | - Weiwei Jin
- State Key Laboratory of Plant Physiology and Biochemistry, National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, the Ministry of Education, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, P. R. China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, P. R. China.
| |
Collapse
|
7
|
Paul P, Antonydhason V, Gopal J, Haga SW, Hasan N, Oh JW. Bioinformatics for Renal and Urinary Proteomics: Call for Aggrandization. Int J Mol Sci 2020; 21:E961. [PMID: 32024005 PMCID: PMC7038205 DOI: 10.3390/ijms21030961] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 01/24/2020] [Accepted: 01/27/2020] [Indexed: 02/07/2023] Open
Abstract
The clinical sampling of urine is noninvasive and unrestricted, whereby huge volumes can be easily obtained. This makes urine a valuable resource for the diagnoses of diseases. Urinary and renal proteomics have resulted in considerable progress in kidney-based disease diagnosis through biomarker discovery and treatment. This review summarizes the bioinformatics tools available for this area of proteomics and the milestones reached using these tools in clinical research. The scant research publications and the even more limited bioinformatic tool options available for urinary and renal proteomics are highlighted in this review. The need for more attention and input from bioinformaticians is highlighted, so that progressive achievements and releases can be made. With just a handful of existing tools for renal and urinary proteomic research available, this review identifies a gap worth targeting by protein chemists and bioinformaticians. The probable causes for the lack of enthusiasm in this area are also speculated upon in this review. This is the first review that consolidates the bioinformatics applications specifically for renal and urinary proteomics.
Collapse
Affiliation(s)
- Piby Paul
- St. Jude Childrens Cancer Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA;
| | - Vimala Antonydhason
- Department of Microbiology and Immunology, Institute for Biomedicine, Gothenburg University, 413 90 Gothenburg, Sweden;
| | - Judy Gopal
- Department of Environmental Health Sciences, Konkuk University, Seoul 143-701, Korea;
| | - Steve W. Haga
- Department of Computer Science and Engineering, National Sun Yat Sen University, Kaohsiung 804, Taiwan;
| | - Nazim Hasan
- Department of Chemistry, Faculty of Science, Jazan University, P.O. Box 114, Jazan 45142, Saudi Arabia;
| | - Jae-Wook Oh
- Department of Stem Cell and Regenerative Biotechnology, Konkuk University, Seoul 05029, Korea
| |
Collapse
|
8
|
Mukherjee S, Cai Z, Mukherjee A, Longkumer I, Mech M, Vupru K, Khate K, Rajkhowa C, Mitra A, Guldbrandtsen B, Lund MS, Sahana G. Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis). BMC Genomics 2019; 20:617. [PMID: 31357931 PMCID: PMC6664528 DOI: 10.1186/s12864-019-5980-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 07/16/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n = 58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun. RESULTS We generated ≈300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1 Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle. CONCLUSION Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.
Collapse
Affiliation(s)
- Sabyasachi Mukherjee
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Zexi Cai
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Anupama Mukherjee
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
- Present address: Dairy Cattle Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana 132001 India
| | - Imsusosang Longkumer
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Moonmoon Mech
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Kezhavituo Vupru
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Kobu Khate
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Chandan Rajkhowa
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Abhijit Mitra
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| |
Collapse
|
9
|
Zyner KG, Mulhearn DS, Adhikari S, Martínez Cuesta S, Di Antonio M, Erard N, Hannon GJ, Tannahill D, Balasubramanian S. Genetic interactions of G-quadruplexes in humans. eLife 2019; 8:e46793. [PMID: 31287417 PMCID: PMC6615864 DOI: 10.7554/elife.46793] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 06/17/2019] [Indexed: 01/20/2023] Open
Abstract
G-quadruplexes (G4) are alternative nucleic acid structures involved in transcription, translation and replication. Aberrant G4 formation and stabilisation is linked to genome instability and cancer. G4 ligand treatment disrupts key biological processes leading to cell death. To discover genes and pathways involved with G4s and gain mechanistic insights into G4 biology, we present the first unbiased genome-wide study to systematically identify human genes that promote cell death when silenced by shRNA in the presence of G4-stabilising small molecules. Many novel genetic vulnerabilities were revealed opening up new therapeutic possibilities in cancer, which we exemplified by an orthogonal pharmacological inhibition approach that phenocopies gene silencing. We find that targeting the WEE1 cell cycle kinase or USP1 deubiquitinase in combination with G4 ligand treatment enhances cell killing. We also identify new genes and pathways regulating or interacting with G4s and demonstrate that the DDX42 DEAD-box helicase is a newly discovered G4-binding protein.
Collapse
Affiliation(s)
- Katherine G Zyner
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
| | - Darcie S Mulhearn
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
| | - Santosh Adhikari
- Department of ChemistryUniversity of CambridgeCambridgeUnited Kingdom
| | | | - Marco Di Antonio
- Department of ChemistryUniversity of CambridgeCambridgeUnited Kingdom
| | - Nicolas Erard
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
| | - Gregory J Hannon
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
| | - David Tannahill
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
| | - Shankar Balasubramanian
- Cancer Research United Kingdom Cambridge InstituteCambridgeUnited Kingdom
- Department of ChemistryUniversity of CambridgeCambridgeUnited Kingdom
- School of Clinical MedicineUniversity of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
10
|
Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat Commun 2019; 10:464. [PMID: 30683940 PMCID: PMC6347627 DOI: 10.1038/s41467-018-07876-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/04/2018] [Indexed: 01/27/2023] Open
Abstract
Broomcorn millet (Panicum miliaceum L.) has strong tolerance to abiotic stresses, and is probably one of the oldest crops, with its earliest cultivation that dated back to ca. ~10,000 years. We report here its genome assembly through a combination of PacBio sequencing, BioNano, and Hi-C (in vivo) mapping. The 18 super scaffolds cover ~95.6% of the estimated genome (~887.8 Mb). There are 63,671 protein-coding genes annotated in this tetraploid genome. About ~86.2% of the syntenic genes in foxtail millet have two homologous copies in broomcorn millet, indicating rare gene loss after tetraploidization in broomcorn millet. Phylogenetic analysis reveals that broomcorn millet and foxtail millet diverged around ~13.1 Million years ago (Mya), while the lineage specific tetraploidization of broomcorn millet may be happened within ~5.91 million years. The genome is not only beneficial for the genome assisted breeding of broomcorn millet, but also an important resource for other Panicum species. Broomcorn millet is one of the oldest crops cultivated by human that has strong abiotic stress tolerance. To facilitate genome assisted breeding of this and related species, the authors report its genome assembly and conduct comparative genome structure and evolution analyses with foxtail millet.
Collapse
|
11
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
12
|
Kim BM, Rhee JS, Choi IY, Lee YM. Transcriptional profiling of antioxidant defense system and heat shock protein (Hsp) families in the cadmium- and copper-exposed marine ciliate Euplotes crassu. Genes Genomics 2017; 40:85-98. [PMID: 29892903 DOI: 10.1007/s13258-017-0611-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 09/14/2017] [Indexed: 11/29/2022]
Abstract
To understand the transcriptional response of antioxidant defense system and heat shock protein (Hsp) families of the marine ciliate Euplotes crassus, we analyzed the transcriptome profile using RNA-seq technology after exposure to cadmium (Cd) and copper (Cu). De novo sequence assembly produced 61,240 unigenes with 21,330 BLAST hits and showed high sequence orthology with transcriptomes of other ciliates. Gene annotation and gene ontology (GO) comparison revealed that E. crassus expressed highly diversified but conserved stress-responsive gene families of the antioxidant defense system and Hsps. After waterborne exposure to 250 μg/L of Cd and 25 μg/L of Cu, transcriptional responses of the gene families were significantly modulated, suggesting that even the unicellular E. crassus has a conserved molecular defense mechanism, such as modulating mRNA expression, for homeostasis. These transcriptional responses make E. crassus a potential model for understanding the molecular response of single cell ciliates to heavy metal contamination.
Collapse
Affiliation(s)
- Bo-Mi Kim
- Unit of Polar Genomics, Korea Polar Research Institute, Incheon, 21990, Republic of Korea
| | - Jae-Sung Rhee
- Department of Marine Science, College of Natural Sciences, Incheon National University, Incheon, 22012, Republic of Korea
| | - Ik-Young Choi
- Department of Agriculture and Life Industry, Kangwon National University, Chuncheon, 24341, Republic of Korea.
| | - Young-Mi Lee
- Department of Life Science, College of Natural Sciences, Sangmyung University, Seoul, 03016, Republic of Korea.
| |
Collapse
|
13
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
14
|
Large-Scale Evolutionary Analysis of Genes and Supergene Clusters from Terpenoid Modular Pathways Provides Insights into Metabolic Diversification in Flowering Plants. PLoS One 2015; 10:e0128808. [PMID: 26046541 PMCID: PMC4457800 DOI: 10.1371/journal.pone.0128808] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/30/2015] [Indexed: 12/31/2022] Open
Abstract
An important component of plant evolution is the plethora of pathways producing more than 200,000 biochemically diverse specialized metabolites with pharmacological, nutritional and ecological significance. To unravel dynamics underlying metabolic diversification, it is critical to determine lineage-specific gene family expansion in a phylogenomics framework. However, robust functional annotation is often only available for core enzymes catalyzing committed reaction steps within few model systems. In a genome informatics approach, we extracted information from early-draft gene-space assemblies and non-redundant transcriptomes to identify protein families involved in isoprenoid biosynthesis. Isoprenoids comprise terpenoids with various roles in plant-environment interaction, such as pollinator attraction or pathogen defense. Combining lines of evidence provided by synteny, sequence homology and Hidden-Markov-Modelling, we screened 17 genomes including 12 major crops and found evidence for 1,904 proteins associated with terpenoid biosynthesis. Our terpenoid genes set contains evidence for 840 core terpene-synthases and 338 triterpene-specific synthases. We further identified 190 prenyltransferases, 39 isopentenyl-diphosphate isomerases as well as 278 and 219 proteins involved in mevalonate and methylerithrol pathways, respectively. Assessing the impact of gene and genome duplication to lineage-specific terpenoid pathway expansion, we illustrated key events underlying terpenoid metabolic diversification within 250 million years of flowering plant radiation. By quantifying Angiosperm-wide versatility and phylogenetic relationships of pleiotropic gene families in terpenoid modular pathways, our analysis offers significant insight into evolutionary dynamics underlying diversification of plant secondary metabolism. Furthermore, our data provide a blueprint for future efforts to identify and more rapidly clone terpenoid biosynthetic genes from any plant species.
Collapse
|
15
|
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer Inform 2014; 13:69-79. [PMID: 25392685 PMCID: PMC4216063 DOI: 10.4137/cin.s13874] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 05/18/2014] [Accepted: 05/18/2014] [Indexed: 12/18/2022] Open
Abstract
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory Riedlinger
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
16
|
Le Texier L, Durand J, Lavault A, Hulin P, Collin O, Le Bras Y, Cuturi MC, Chiffoleau E. LIMLE, a new molecule over-expressed following activation, is involved in the stimulatory properties of dendritic cells. PLoS One 2014; 9:e93894. [PMID: 24705920 PMCID: PMC3976354 DOI: 10.1371/journal.pone.0093894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2013] [Accepted: 03/10/2014] [Indexed: 11/18/2022] Open
Abstract
Dendritic cells are sentinels of the immune system distributed throughout the body, that following danger signals will migrate to secondary lymphoid organs to induce effector T cell responses. We have identified, in a rodent model of graft rejection, a new molecule expressed by dendritic cells that we have named LIMLE (RGD1310371). To characterize this new molecule, we analyzed its regulation of expression and its function. We observed that LIMLE mRNAs were rapidly and strongly up regulated in dendritic cells following inflammatory stimulation. We demonstrated that LIMLE inhibition does not alter dendritic cell maturation or cytokine production following Toll-like-receptor stimulation. However, it reduces their ability to stimulate effector T cells in a mixed leukocyte reaction or T cell receptor transgenic system. Interestingly, we observed that LIMLE protein localized with actin at some areas under the plasma membrane. Moreover, LIMLE is highly expressed in testis, trachea, lung and ciliated cells and it has been shown that cilia formation bears similarities to formation of the immunological synapse which is required for the T cell activation by dendritic cells. Taken together, these data suggest a role for LIMLE in specialized structures of the cytoskeleton that are important for dynamic cellular events such as immune synapse formation. In the future, LIMLE may represent a new target to reduce the capacity of dendritic cells to stimulate T cells and to regulate an immune response.
Collapse
Affiliation(s)
- Laëtitia Le Texier
- INSERM, U1064, Nantes, France
- CHU Nantes, Institut de Transplantation et de Recherche en Transplantation, ITUN, Nantes, France
- Université de Nantes, Faculté de Médecine, Nantes, France
| | - Justine Durand
- INSERM, U1064, Nantes, France
- CHU Nantes, Institut de Transplantation et de Recherche en Transplantation, ITUN, Nantes, France
- Université de Nantes, Faculté de Médecine, Nantes, France
| | - Amélie Lavault
- INSERM, U1064, Nantes, France
- CHU Nantes, Institut de Transplantation et de Recherche en Transplantation, ITUN, Nantes, France
- Université de Nantes, Faculté de Médecine, Nantes, France
| | | | - Olivier Collin
- Plateforme GenOuest, IRISA-INRIA, Campus de Beaulieu, Rennes, France
| | - Yvan Le Bras
- Plateforme GenOuest, IRISA-INRIA, Campus de Beaulieu, Rennes, France
| | - Maria-Cristina Cuturi
- INSERM, U1064, Nantes, France
- CHU Nantes, Institut de Transplantation et de Recherche en Transplantation, ITUN, Nantes, France
- Université de Nantes, Faculté de Médecine, Nantes, France
| | - Elise Chiffoleau
- INSERM, U1064, Nantes, France
- CHU Nantes, Institut de Transplantation et de Recherche en Transplantation, ITUN, Nantes, France
- Université de Nantes, Faculté de Médecine, Nantes, France
- * E-mail:
| |
Collapse
|
17
|
Abstract
Systems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.
Collapse
|
18
|
Rappoport N, Linial M. Viral proteins acquired from a host converge to simplified domain architectures. PLoS Comput Biol 2012; 8:e1002364. [PMID: 22319434 PMCID: PMC3271019 DOI: 10.1371/journal.pcbi.1002364] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 12/09/2011] [Indexed: 01/17/2023] Open
Abstract
The infection cycle of viruses creates many opportunities for the exchange of genetic material with the host. Many viruses integrate their sequences into the genome of their host for replication. These processes may lead to the virus acquisition of host sequences. Such sequences are prone to accumulation of mutations and deletions. However, in rare instances, sequences acquired from a host become beneficial for the virus. We searched for unexpected sequence similarity among the 900,000 viral proteins and all proteins from cellular organisms. Here, we focus on viruses that infect metazoa. The high-conservation analysis yielded 187 instances of highly similar viral-host sequences. Only a small number of them represent viruses that hijacked host sequences. The low-conservation sequence analysis utilizes the Pfam family collection. About 5% of the 12,000 statistical models archived in Pfam are composed of viral-metazoan proteins. In about half of Pfam families, we provide indirect support for the directionality from the host to the virus. The other families are either wrongly annotated or reflect an extensive sequence exchange between the viruses and their hosts. In about 75% of cross-taxa Pfam families, the viral proteins are significantly shorter than their metazoan counterparts. The tendency for shorter viral proteins relative to their related host proteins accounts for the acquisition of only a fragment of the host gene, the elimination of an internal domain and shortening of the linkers between domains. We conclude that, along viral evolution, the host-originated sequences accommodate simplified domain compositions. We postulate that the trimmed proteins act by interfering with the fundamental function of the host including intracellular signaling, post-translational modification, protein-protein interaction networks and cellular trafficking. We compiled a collection of hijacked protein sequences. These sequences are attractive targets for manipulation of viral infection. Many studies focused on the exchange of genetic material between viruses and cellular hosts. The diversity of viruses argues that, along the evolutionary history, viruses have shaped the host genomes. While most viruses have many opportunities to exchange genetic material with their hosts, tracing such events is challenging as the origin of the sequences is masked by the high mutation rate of many viruses. On the other end, for completing a successful infection cycle the viruses must cope with the cell machinery for entry, replication and translation while hiding from the host immune system. We collected evidence for instances of viral protein sequences that were most probably “stolen” from the hosts. Additionally, a shared ancestry with metazoa is associated with 670 Pfam domain families. For half of these families, the origin of the viral proteins from its host is supported. For about 75% of the cross virus-metazoa families, the viral proteins are significantly shorter than their counterpart host proteins. Most of these cross-taxa viral proteins are single domain proteins and proteins with a simple domain composition relative to the proteins of their hosts. These viral proteins provide insights on the overlooked intimacy of viruses and their multicellular hosts.
Collapse
Affiliation(s)
- Nadav Rappoport
- School of Computer Science and Engineering, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Sudarsky Center for Computational Biology, Hebrew University of Jerusalem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|
19
|
Fieldhouse RJ, Turgeon Z, White D, Merrill AR. Cholera- and anthrax-like toxins are among several new ADP-ribosyltransferases. PLoS Comput Biol 2010; 6:e1001029. [PMID: 21170356 PMCID: PMC3000352 DOI: 10.1371/journal.pcbi.1001029] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2010] [Accepted: 11/10/2010] [Indexed: 11/19/2022] Open
Abstract
Chelt, a cholera-like toxin from Vibrio cholerae, and Certhrax, an anthrax-like toxin from Bacillus cereus, are among six new bacterial protein toxins we identified and characterized using in silico and cell-based techniques. We also uncovered medically relevant toxins from Mycobacterium avium and Enterococcus faecalis. We found agriculturally relevant toxins in Photorhabdus luminescens and Vibrio splendidus. These toxins belong to the ADP-ribosyltransferase family that has conserved structure despite low sequence identity. Therefore, our search for new toxins combined fold recognition with rules for filtering sequences--including a primary sequence pattern--to reduce reliance on sequence identity and identify toxins using structure. We used computers to build models and analyzed each new toxin to understand features including: structure, secretion, cell entry, activation, NAD+ substrate binding, intracellular target binding and the reaction mechanism. We confirmed activity using a yeast growth test. In this era where an expanding protein structure library complements abundant protein sequence data--and we need high-throughput validation--our approach provides insight into the newest toxin ADP-ribosyltransferases.
Collapse
Affiliation(s)
- Robert J. Fieldhouse
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - Zachari Turgeon
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - Dawn White
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - A. Rod Merrill
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
20
|
Abstract
The CATH database provides hierarchical classification of protein domains based on their folding patterns. Domains are obtained from protein structures deposited in the Protein Data Bank and both domain identification and subsequent classification use manual as well as automated procedures. The accompanying website http://www.cathdb.info provides an easy-to-use entry to the classification, allowing for both browsing and downloading of data. Here, we give a brief review of the database, its corresponding website and some related tools.
Collapse
Affiliation(s)
- Michael Knudsen
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | | |
Collapse
|
21
|
Protein domain architectures. Methods Mol Biol 2010. [PMID: 20221914 DOI: 10.1007/978-1-60327-241-4_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
Proteins are composed of functional units, or domains, that can be found alone or in combination with other domains. Analysis of protein domain architectures and the movement of protein domains within and across different genomes provide clues about the evolution of protein function. The classification of proteins into families and domains is provided through publicly available tools and databases that use known protein domains to predict other members in new proteins sequences. Currently at least 80% of the main protein sequence databases can be classified using these tools, thus providing a large data set to work from for analyzing protein domain architectures. Each of the protein domain databases provide intuitive web interfaces for viewing and analyzing their domain classifications and provide their data freely for downloading. Some of the main protein family and domain databases are described here, along with their Web-based tools for analyzing domain architectures.
Collapse
|
22
|
Kahlem P, Clegg A, Reisinger F, Xenarios I, Hermjakob H, Orengo C, Birney E. ENFIN--A European network for integrative systems biology. C R Biol 2010; 332:1050-8. [PMID: 19909926 DOI: 10.1016/j.crvi.2009.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.
Collapse
Affiliation(s)
- Pascal Kahlem
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
23
|
Beckstette M, Homann R, Giegerich R, Kurtz S. Significant speedup of database searches with HMMs by search space reduction with PSSM family models. ACTA ACUST UNITED AC 2009; 25:3251-8. [PMID: 19828575 PMCID: PMC2788931 DOI: 10.1093/bioinformatics/btp593] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. RESULTS We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92. AVAILABILITY The presented algorithms are implemented in the program PoSSuMsearch2, available for download at http://bibiserv.techfak.uni-bielefeld.de/possumsearch2/. CONTACT beckstette@zbh.uni-hamburg.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Beckstette
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.
| | | | | | | |
Collapse
|
24
|
Arnold K, Kiefer F, Kopp J, Battey JND, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T. The Protein Model Portal. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2009; 10:1-8. [PMID: 19037750 PMCID: PMC2704613 DOI: 10.1007/s10969-008-9048-5] [Citation(s) in RCA: 117] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 11/02/2008] [Indexed: 11/28/2022]
Abstract
Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.
Collapse
Affiliation(s)
- Konstantin Arnold
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Florian Kiefer
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Jürgen Kopp
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - James N. D. Battey
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Michael Podvinec
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - John D. Westbrook
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8087 USA
| | - Helen M. Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8087 USA
| | - Lorenza Bordoli
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| |
Collapse
|
25
|
Wichadakul D, McDermott J, Samudrala R. Prediction and integration of regulatory and protein-protein interactions. Methods Mol Biol 2009; 541:101-43. [PMID: 19381527 DOI: 10.1007/978-1-59745-243-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast ( 1 ) and worm ( 2 ). Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse ( 3 ). We describe how to compile and handle various formats and identifiers of data sets from different sources and how to predict TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein subcellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.
Collapse
|
26
|
Addou S, Rentzsch R, Lee D, Orengo CA. Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer. J Mol Biol 2008; 387:416-30. [PMID: 19135455 DOI: 10.1016/j.jmb.2008.12.045] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 12/12/2008] [Accepted: 12/17/2008] [Indexed: 11/24/2022]
Abstract
Divergence in function of homologous proteins is based on both sequence and structural changes. Overall enzyme function has been reported to diverge earlier (50% sequence identity) than overall structure (35%). We herein study the functional conservation of enzymes and non-enzyme sequences using the protein domain families in CATH-Gene3D. Despite the rapid increase in sequence data since the last comprehensive study by Tian and Skolnick, our findings suggest that generic thresholds of 40% and 60% aligned sequence identity are still sufficient to safely inherit third-level and full Enzyme Commission numbers, respectively. This increases to 50% and 70% on the domain level, unless the multi-domain architecture matches. Assignments from the Kyoto Encyclopedia of Genes and Genomes and the Munich Information Center for Protein Sequences Functional Catalogue seem to be less conserved with sequence, probably due to a more pathway-centric view: 80% domain sequence identity is required for safe function transfer. Comparing domains (more pairwise relationships) and the use of family-specific thresholds (varying evolutionary speeds) yields the highest coverage rates when transferring functions to model proteomes. An average twofold increase in enzyme annotations is seen for 523 proteomes in Gene3D. As simple 'rules of thumb', sequence identity thresholds do not require a bioinformatics background. We will provide and update this information with future releases of CATH-Gene3D.
Collapse
Affiliation(s)
- Sarah Addou
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | | | | |
Collapse
|
27
|
Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J. SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res 2008; 37:D380-6. [PMID: 19036790 PMCID: PMC2686452 DOI: 10.1093/nar/gkn762] [Citation(s) in RCA: 343] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.
Collapse
Affiliation(s)
- Derek Wilson
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics "pipeline." Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding "problematic" proteins while also ensuring the ultimate goals of the project are followed.
Collapse
|
29
|
Local function conservation in sequence and structure space. PLoS Comput Biol 2008; 4:e1000105. [PMID: 18604264 PMCID: PMC2427199 DOI: 10.1371/journal.pcbi.1000105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Accepted: 05/28/2008] [Indexed: 11/19/2022] Open
Abstract
We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de).
Collapse
|
30
|
Mulder NJ, Apweiler R. The InterPro database and tools for protein domain analysis. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.7. [PMID: 18428686 DOI: 10.1002/0471250953.bi0207s21] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
InterPro provides a one-stop shop for protein-sequence classification, freeing the user from having to visit multiple databases separately and rationalize the different results in varying formats. This unit describes how to submit a sequence to InterProScan via a Web server. It also provides instructions for installing and running InterProScan locally. In addition, details on browsing InterPro families and domains of interest using the InterPro Web and sequence retrieval system (SRS) are provided to show users how to get the most from the resource.
Collapse
Affiliation(s)
- Nicola J Mulder
- The EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | |
Collapse
|
31
|
|
32
|
Abstract
Domains are considered to be the building blocks of protein structures. A protein can contain a single domain or multiple domains, each one typically associated with a specific function. The combination of domains determines the function of the protein, its subcellular localization and the interactions it is involved in. Determining the domain structure of a protein is important for multiple reasons, including protein function analysis and structure prediction. This chapter reviews the different approaches for domain prediction and discusses lessons learned from the application of these methods.
Collapse
Affiliation(s)
- Helgi Ingolfsson
- Department of Physiology and Biophysics, Weill Medical College of Cornell University, Ithaca, NY, USA
| | | |
Collapse
|
33
|
Ranea JAG, Yeats C, Grant A, Orengo CA. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comput Biol 2007; 3:e237. [PMID: 18052542 PMCID: PMC2098864 DOI: 10.1371/journal.pcbi.0030237] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 10/17/2007] [Indexed: 11/17/2022] Open
Abstract
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes. The vast number of protein sequences being determined by the international genomics projects means that it is not possible to functionally characterise all the proteins through direct experimentation. One of the more successful electronic methods for detecting functionally associated genes has been through the comparison of genes' phylogenetic profiles. This method is based on the hypothesis that two functionally related genes will show very similar presence–absence profile patterns throughout different organisms. Whilst these methods have grown increasingly sophisticated, they have largely been based on detecting functionally homologous genes in different species (technically known as orthologous genes) and thus better suited to prokaryotic genomes, where this can be done more easily. We have developed a new type of hierarchical phylogenetic profile by subdividing protein families into subclusters in different sequence identity levels. This new approach encapsulates a more realistic model of the functional variation that uneven natural selection pressure produces on different protein families and organisms, and it can detect functional relationships between protein families without the initial application of rigid sequence similarity thresholds or complex protocols for orthology assignment. These advantages are especially useful in eukaryotes since the larger average size of eukaryotic multigene families makes them more prone to orthology mis-assignment than in prokaryotes.
Collapse
Affiliation(s)
- Juan A G Ranea
- Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom.
| | | | | | | |
Collapse
|
34
|
Chen J, Xu H, Aronow BJ, Jegga AG. Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 2007; 8:392. [PMID: 17939863 PMCID: PMC2194797 DOI: 10.1186/1471-2105-8-392] [Citation(s) in RCA: 197] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2007] [Accepted: 10/16/2007] [Indexed: 11/13/2022] Open
Abstract
Background The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes. Results Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene , outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR. Conclusion The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.
Collapse
Affiliation(s)
- Jing Chen
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA.
| | | | | | | |
Collapse
|
35
|
In silico characterization of proteins: UniProt, InterPro and Integr8. Mol Biotechnol 2007; 38:165-77. [PMID: 18219596 DOI: 10.1007/s12033-007-9003-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Accepted: 08/31/2007] [Indexed: 10/23/2022]
Abstract
Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis.
Collapse
|
36
|
Song N, Sedgewick RD, Durand D. Domain architecture comparison for multidomain homology identification. J Comput Biol 2007; 14:496-516. [PMID: 17572026 DOI: 10.1089/cmb.2007.a009] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the similarity of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of protein domain content. We developed several schemes for scoring the homology of a pair of protein sequences based on methods used in the field of information retrieval. We evaluate the proposed methods and methods used in the literature using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting promiscuous domains and of compensating for the statistical effect of having a large number of domains in a protein. Using logistic regression, we demonstrate the benefit of combining similarity measures based on domain content with sequence similarity measures.
Collapse
Affiliation(s)
- N Song
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | |
Collapse
|
37
|
Mueller M, Martens L, Apweiler R. Annotating the human proteome: Beyond establishing a parts list. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1774:175-91. [PMID: 17223395 DOI: 10.1016/j.bbapap.2006.11.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 12/31/2022]
Abstract
The completion of the human genome has shifted the attention from deciphering the sequence to the identification and characterisation of the functional components, including genes. Improved gene prediction algorithms, together with the existing transcript and protein information, have enabled the identification of most exons in a genome. Availability of the 'parts list' has fostered the development of experimental approaches to systematically interrogate gene function on the genome, transcriptome and proteome level. Studying gene function at the protein level is vital to the understanding of how cells perform their functions as variations in protein isoforms and protein quantity which may underlie a change in phenotype can often not be deduced from sequence or transcript level genomics experiments alone. Recent advancements in proteomics have afforded technologies capable of measuring protein expression, post-translational modifications of these proteins, their subcellular localisation and assembly into complexes and pathways. Although an enormous amount of data already exists on the function of many human proteins, much of it is scattered over multiple resources. Public domain databases are therefore required to manage and collate this information and present it to the user community in both a human and machine readable manner. Of special importance here is the integration of heterogeneous data to facilitate the creation of resources that go beyond a mere parts list.
Collapse
Affiliation(s)
- Michael Mueller
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
38
|
Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM. Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol 2007; 367:1511-22. [PMID: 17316683 PMCID: PMC2566530 DOI: 10.1016/j.jmb.2007.01.063] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 01/23/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]
Abstract
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
Collapse
Affiliation(s)
- James D Watson
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Finn RD, Stalker JW, Jackson DK, Kulesha E, Clements J, Pettett R. ProServer: a simple, extensible Perl DAS server. Bioinformatics 2007; 23:1568-70. [PMID: 17237073 PMCID: PMC2989875 DOI: 10.1093/bioinformatics/btl650] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary: The increasing size and complexity of biological databases has led to a growing trend to federate rather than duplicate them. In order to share data between federated databases, protocols for the exchange mechanism must be developed. One such data exchange protocol that is widely used is the Distributed Annotation System (DAS). For example, DAS has enabled small experimental groups to integrate their data into the Ensembl genome browser. We have developed ProServer, a simple, lightweight, Perl-based DAS server that does not depend on a separate HTTP server. The ProServer package is easily extensible, allowing data to be served from almost any underlying data model. Recent additions to the DAS protocol have enabled both structure and alignment (sequence and structural) data to be exchanged. ProServer allows both of these data types to be served. Availability: ProServer can be downloaded from http://www.sanger.ac.uk/proserver/ or CPAN http://search.cpan.org/~rpettett/. Details on the system requirements and installation of ProServer can be found at http://www.sanger.ac.uk/proserver/. Contact:rmp@sanger.ac.uk Supplementary Materials: DasClientExamples.pdf
Collapse
Affiliation(s)
- Robert D Finn
- Wellcome Trust Sanger Institute, Wellcome Trust Geome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | |
Collapse
|
40
|
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJA, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. New developments in the InterPro database. Nucleic Acids Res 2007; 35:D224-8. [PMID: 17202162 PMCID: PMC1899100 DOI: 10.1093/nar/gkl841] [Citation(s) in RCA: 349] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 10/06/2006] [Accepted: 10/06/2006] [Indexed: 11/14/2022] Open
Abstract
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.
Collapse
Affiliation(s)
- Nicola J Mulder
- EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Wilson D, Madera M, Vogel C, Chothia C, Gough J. The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res 2006; 35:D308-13. [PMID: 17098927 PMCID: PMC1669749 DOI: 10.1093/nar/gkl910] [Citation(s) in RCA: 169] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The SUPERFAMILY database provides protein domain assignments, at the SCOP 'superfamily' level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from http://supfam.org. The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment.
Collapse
Affiliation(s)
- Derek Wilson
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK.
| | | | | | | | | |
Collapse
|
42
|
Abstract
Owing to the ongoing success of the genome sequencing and structural genomics projects, the increase in both sequence and structural data is rapid. The development of tools for the annotation of sequence and structural data has become more important in the hope of keeping up with this data explosion. Scientists in this field have addressed these issues over the last 10 years and there now exists a wealth of methods and approaches to help interpret these data. However, there is no current way in which these methods can be incorporated easily so that the resulting annotations can be viewed together. This review discusses the development of these annotation methods and introduces the BioSapiens Network of Excellence, which has been formed in order to integrate the methods which have been developed in Europe.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
43
|
McGuffin LJ, Smith RT, Bryson K, Sørensen SA, Jones DT. High throughput profile-profile based fold recognition for the entire human proteome. BMC Bioinformatics 2006; 7:288. [PMID: 16759376 PMCID: PMC1513610 DOI: 10.1186/1471-2105-7-288] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2006] [Accepted: 06/07/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Collapse
Affiliation(s)
- Liam J McGuffin
- The BioCentre, University of Reading, Whiteknights, PO Box 221, Reading RG6 6AS, UK
| | - Richard T Smith
- Department of Computer Science, University College London, Malet Place, London WC1E 6BT, UK
| | - Kevin Bryson
- Department of Computer Science, University College London, Malet Place, London WC1E 6BT, UK
| | - Søren-Aksel Sørensen
- Department of Computer Science, University College London, Malet Place, London WC1E 6BT, UK
| | - David T Jones
- Department of Computer Science, University College London, Malet Place, London WC1E 6BT, UK
| |
Collapse
|