1
|
Loreto ELS, Melo ESD, Wallau GL, Gomes TMFF. The good, the bad and the ugly of transposable elements annotation tools. Genet Mol Biol 2024; 46:e20230138. [PMID: 38373163 PMCID: PMC10876081 DOI: 10.1590/1678-4685-gmb-2023-0138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/26/2023] [Indexed: 02/21/2024] Open
Abstract
Transposable elements are repetitive and mobile DNA segments that can be found in virtually all organisms investigated to date. Their complex structure and variable nature are particularly challenging from the genomic annotation point of view. Many softwares have been developed to automate and facilitate TEs annotation at the genomic level, but they are highly heterogeneous regarding documentation, usability and methods. In this review, we revisited the existing software for TE genomic annotation, concentrating on the most often used ones, the methodologies they apply, and usability. Building on the state of the art of TE annotation software we propose best practices and highlight the strengths and weaknesses from the available solutions.
Collapse
Affiliation(s)
- Elgion L S Loreto
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
- Universidade Federal de Santa Maria, Departamento de Bioquímica e Biologia Molecular, Santa Maria, RS, Brazil
| | - Elverson S de Melo
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Gabriel L Wallau
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Tiago M F F Gomes
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
| |
Collapse
|
2
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
3
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
4
|
Nawade B, Kumar A, Maurya R, Subramani R, Yadav R, Singh K, Rangan P. Longer Duration of Active Oil Biosynthesis during Seed Development Is Crucial for High Oil Yield-Lessons from Genome-Wide In Silico Mining and RNA-Seq Validation in Sesame. PLANTS (BASEL, SWITZERLAND) 2022; 11:2980. [PMID: 36365434 PMCID: PMC9657858 DOI: 10.3390/plants11212980] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
Sesame, one of the ancient oil crops, is an important oilseed due to its nutritionally rich seeds with high protein content. Genomic scale information for sesame has become available in the public databases in recent years. The genes and their families involved in oil biosynthesis in sesame are less studied than in other oilseed crops. Therefore, we retrieved a total of 69 genes and their translated amino acid sequences, associated with gene families linked to the oil biosynthetic pathway. Genome-wide in silico mining helped identify key regulatory genes for oil biosynthesis, though the findings require functional validation. Comparing sequences of the SiSAD (stearoyl-acyl carrier protein (ACP)-desaturase) coding genes with known SADs helped identify two SiSAD family members that may be palmitoyl-ACP-specific. Based on homology with lysophosphatidic acid acyltransferase (LPAAT) sequences, an uncharacterized gene has been identified as SiLPAAT1. Identified key regulatory genes associated with high oil content were also validated using publicly available transcriptome datasets of genotypes contrasting for oil content at different developmental stages. Our study provides evidence that a longer duration of active oil biosynthesis is crucial for high oil accumulation during seed development. This underscores the importance of early onset of oil biosynthesis in developing seeds. Up-regulating, identified key regulatory genes of oil biosynthesis during early onset of seed development, should help increase oil yields.
Collapse
Affiliation(s)
- Bhagwat Nawade
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Ajay Kumar
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Rasna Maurya
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Rajkumar Subramani
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Rashmi Yadav
- Division of Germplasm Evaluation, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Kuldeep Singh
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
| | - Parimalan Rangan
- Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
5
|
Suvorov A, Scornavacca C, Fujimoto MS, Bodily P, Clement M, Crandall KA, Whiting MF, Schrider DR, Bybee SM. Deep ancestral introgression shapes evolutionary history of dragonflies and damselflies. Syst Biol 2021; 71:526-546. [PMID: 34324671 PMCID: PMC9017697 DOI: 10.1093/sysbio/syab063] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 07/20/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
Introgression is an important biological process affecting at least 10% of the extant species in the animal kingdom. Introgression significantly impacts inference of phylogenetic species relationships where a strictly binary tree model cannot adequately explain reticulate net-like species relationships. Here we use phylogenomic approaches to understand patterns of introgression along the evolutionary history of a unique, non-model insect system: dragonflies and damselflies (Odonata). We demonstrate that introgression is a pervasive evolutionary force across various taxonomic levels within Odonata. In particular, we show that the morphologically "intermediate" species of Anisozygoptera (one of the three primary suborders within Odonata besides Zygoptera and Anisoptera), which retain phenotypic characteristics of the other two suborders, experienced high levels of introgression likely coming from zygopteran genomes. Additionally, we find evidence for multiple cases of deep inter-superfamilial ancestral introgression.
Collapse
Affiliation(s)
- Anton Suvorov
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution Université de Montpellier, CNRS, IRD, EPHE CC 064, Place Eugène Bataillon, 34095 Montpellier Cedex 05, France
| | - M Stanley Fujimoto
- Department of Computer Science, Brigham Young University, Provo, UT, United States
| | - Paul Bodily
- Department of Computer Science, Idaho State University, Pocatello, ID, United States
| | - Mark Clement
- Department of Computer Science, Brigham Young University, Provo, UT, United States
| | - Keith A Crandall
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, United States
| | - Michael F Whiting
- Department of Biology, Brigham Young University, Provo, UT, United States.,M.L. Bean Museum, Brigham Young University, Provo, UT, United States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Seth M Bybee
- Department of Biology, Brigham Young University, Provo, UT, United States.,M.L. Bean Museum, Brigham Young University, Provo, UT, United States
| |
Collapse
|
6
|
Suvorov A, Hochuli J, Schrider DR. Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning. Syst Biol 2020; 69:221-233. [PMID: 31504938 PMCID: PMC8204903 DOI: 10.1093/sysbio/syz060] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 08/28/2019] [Indexed: 11/13/2022] Open
Abstract
Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several "zones" of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. In this study, we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. Although numerous practical challenges remain, these findings suggest that the deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.
Collapse
Affiliation(s)
- Anton Suvorov
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, UNC-Chapel Hill, Chapel Hill, NC 27599-7264, USA
| | - Joshua Hochuli
- Biological and Biomedical Sciences Program, University of North Carolina at Chapel Hill, 130 Mason Farm Road, UNC-Chapel Hill Chapel Hill, NC 27599-7264, USA
| | - Daniel R Schrider
- Biological and Biomedical Sciences Program, University of North Carolina at Chapel Hill, 130 Mason Farm Road, UNC-Chapel Hill Chapel Hill, NC 27599-7264, USA
| |
Collapse
|
7
|
A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. UNSUPERVISED AND SEMI-SUPERVISED LEARNING 2020. [DOI: 10.1007/978-3-030-22475-2_1] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
8
|
Garcia DC, Cheng X, Land ML, Standaert RF, Morrell-Falvey JL, Doktycz MJ. Computationally Guided Discovery and Experimental Validation of Indole-3-acetic Acid Synthesis Pathways. ACS Chem Biol 2019; 14:2867-2875. [PMID: 31693336 DOI: 10.1021/acschembio.9b00725] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Elucidating the interaction networks associated with secondary metabolite production in microorganisms is an ongoing challenge made all the more daunting by the rate at which DNA sequencing technology reveals new genes and potential pathways. Developing the culturing methods, expression conditions, and genetic systems needed for validating pathways in newly discovered microorganisms is often not possible. Therefore, new tools and techniques are needed for defining complex metabolic pathways. Here, we describe an in vitro computationally assisted pathway description approach that employs bioinformatic searches of genome databases, protein structural modeling, and protein-ligand-docking simulations to predict the gene products most likely to be involved in a particular secondary metabolite production pathway. This information is then used to direct in vitro reconstructions of the pathway and subsequent confirmation of pathway activity using crude enzyme preparations. As a test system, we elucidated the pathway for biosynthesis of indole-3-acetic acid (IAA) in the plant-associated microbe Pantoea sp. YR343. This organism is capable of metabolizing tryptophan into the plant phytohormone IAA. BLAST analyses identified a likely three-step pathway involving an amino transferase, an indole pyruvate decarboxylase, and a dehydrogenase. However, multiple candidate enzymes were identified at each step, resulting in a large number of potential pathway reconstructions (32 different enzyme combinations). Our approach shows the effectiveness of crude extracts to rapidly elucidate enzymes leading to functional pathways. Results are compared to affinity purified enzymes for select combinations and found to yield similar relative activities. Further, in vitro testing of the pathway reconstructions revealed the "underground" nature of IAA metabolism in Pantoea sp. YR343 and the various mechanisms used to produce IAA. Importantly, our experiments illustrate the scalable integration of computational tools and cell-free enzymatic reactions to identify and validate metabolic pathways in a broadly applicable manner.
Collapse
Affiliation(s)
- David C. Garcia
- Biological and Nanoscale Systems Group, Biosciences Division Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Tennessee 37996-4519, United States
| | - Xiaolin Cheng
- College of Pharmacy, The Ohio State University, Columbus, Ohio 43210, United States
| | - Miriam L. Land
- Computational Biology and Bioinformatics Group, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Robert F. Standaert
- Biological and Nanoscale Systems Group, Biosciences Division Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
- Department of Chemistry, East Tennessee State University, Johnson City, Tennessee 37604, United States
| | - Jennifer L. Morrell-Falvey
- Biological and Nanoscale Systems Group, Biosciences Division Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Mitchel J. Doktycz
- Biological and Nanoscale Systems Group, Biosciences Division Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Tennessee 37996-4519, United States
| |
Collapse
|
9
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
10
|
Yu CY, Li XX, Yang H, Li YH, Xue WW, Chen YZ, Tao L, Zhu F. Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. Int J Mol Sci 2018; 19:E183. [PMID: 29316706 PMCID: PMC5796132 DOI: 10.3390/ijms19010183] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 12/09/2017] [Accepted: 01/04/2018] [Indexed: 12/27/2022] Open
Abstract
The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
Collapse
Affiliation(s)
- Chun Yan Yu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Xiao Xu Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Hong Yang
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Ying Hong Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Wei Wei Xue
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
| | - Yu Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore.
| | - Lin Tao
- School of Medicine, Hangzhou Normal University, Hangzhou 310012, China.
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
11
|
Sharkey CR, Fujimoto MS, Lord NP, Shin S, McKenna DD, Suvorov A, Martin GJ, Bybee SM. Overcoming the loss of blue sensitivity through opsin duplication in the largest animal group, beetles. Sci Rep 2017; 7:8. [PMID: 28127058 PMCID: PMC5428366 DOI: 10.1038/s41598-017-00061-7] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 12/16/2016] [Indexed: 11/09/2022] Open
Abstract
Opsin proteins are fundamental components of animal vision whose structure largely determines the sensitivity of visual pigments to different wavelengths of light. Surprisingly little is known about opsin evolution in beetles, even though they are the most species rich animal group on Earth and exhibit considerable variation in visual system sensitivities. We reveal the patterns of opsin evolution across 62 beetle species and relatives. Our results show that the major insect opsin class (SW) that typically confers sensitivity to "blue" wavelengths was lost ~300 million years ago, before the origin of modern beetles. We propose that UV and LW opsin gene duplications have restored the potential for trichromacy (three separate channels for colour vision) in beetles up to 12 times and more specifically, duplications within the UV opsin class have likely led to the restoration of "blue" sensitivity up to 10 times. This finding reveals unexpected plasticity within the insect visual system and highlights its remarkable ability to evolve and adapt to the available light and visual cues present in the environment.
Collapse
Affiliation(s)
- Camilla R Sharkey
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT, 84602, USA.
| | - M Stanley Fujimoto
- Computer Science Department, Brigham Young University, Provo, Utah, 84602, USA
| | - Nathan P Lord
- Department of Biological and Environmental Sciences, Georgia College & State University, Campus Box 081, Milledgeville, GA, 31061, USA
| | - Seunggwan Shin
- Department of Biological Sciences, University of Memphis, 3700 Walker Avenue, Memphis, TN, 38152, USA
| | - Duane D McKenna
- Department of Biological Sciences, University of Memphis, 3700 Walker Avenue, Memphis, TN, 38152, USA
| | - Anton Suvorov
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT, 84602, USA
| | - Gavin J Martin
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT, 84602, USA
| | - Seth M Bybee
- Department of Biology, Brigham Young University, 4102 LSB, Provo, UT, 84602, USA
| |
Collapse
|
12
|
Suvorov A, Jensen NO, Sharkey CR, Fujimoto MS, Bodily P, Wightman HMC, Ogden TH, Clement MJ, Bybee SM. Opsins have evolved under the permanent heterozygote model: insights from phylotranscriptomics of Odonata. Mol Ecol 2016; 26:1306-1322. [DOI: 10.1111/mec.13884] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 09/24/2016] [Accepted: 10/04/2016] [Indexed: 02/04/2023]
Affiliation(s)
- Anton Suvorov
- Department of Biology; Brigham Young University; Provo UT 84602 USA
| | | | | | | | - Paul Bodily
- Computer Science Department; Brigham Young University; Provo UT 84602 USA
| | | | - T. Heath Ogden
- Department of Biology; Utah Valley University; Orem UT 84058 USA
| | - Mark J. Clement
- Computer Science Department; Brigham Young University; Provo UT 84602 USA
| | - Seth M. Bybee
- Department of Biology; Brigham Young University; Provo UT 84602 USA
| |
Collapse
|