1
|
Wang J, Xue Q, Zhang CWJ, Wong KKL, Liu Z. Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework. Front Cardiovasc Med 2024; 11:1360548. [PMID: 39011494 PMCID: PMC11246996 DOI: 10.3389/fcvm.2024.1360548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 06/11/2024] [Indexed: 07/17/2024] Open
Abstract
Objective This study focuses on the innovative application of Automated Machine Learning (AutoML) technology in cardiovascular medicine to construct an explainable Coronary Artery Disease (CAD) prediction model to support the clinical diagnosis of CAD. Methods This study utilizes a combined data set of five public data sets related to CAD. An ensemble model is constructed using the AutoML open-source framework AutoGluon to evaluate the feasibility of AutoML in constructing a disease prediction model in cardiovascular medicine. The performance of the ensemble model is compared against individual baseline models. Finally, the disease prediction ensemble model is explained using SHapley Additive exPlanations (SHAP). Results The experimental results show that the AutoGluon-based ensemble model performs better than the individual baseline models in predicting CAD. It achieved an accuracy of 0.9167 and an AUC of 0.9562 in 4-fold cross-bagging. SHAP measures the importance of each feature to the prediction of the model and explains the prediction results of the model. Conclusion This study demonstrates the feasibility and efficacy of AutoML technology in cardiovascular medicine and highlights its potential in disease prediction. AutoML reduces the barriers to model building and significantly improves prediction accuracy. Additionally, the integration of SHAP enhances model transparency and explainability, which is critical to ensuring model credibility and widespread adoption in cardiovascular medicine.
Collapse
Affiliation(s)
- Jianghong Wang
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
| | - Qiang Xue
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
| | - Chris W J Zhang
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Zhihua Liu
- Faculty of Information Engineering and Automation, Center for Precision Medicine, Yan'an Hospital of Kunming City & Kunming University of Science and Technology, Kunming, China
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Bayer HealthCare & Dana-Farber Cancer Institute, Harvard University, Boston, MA, United States
| |
Collapse
|
2
|
Liu Z, Zhao X. piRNAs as emerging biomarkers and physiological regulatory molecules in cardiovascular disease. Biochem Biophys Res Commun 2024; 711:149906. [PMID: 38640879 DOI: 10.1016/j.bbrc.2024.149906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/01/2024] [Accepted: 04/05/2024] [Indexed: 04/21/2024]
Abstract
Cardiovascular diseases (CVD) represent one of the most considerable global health threats, owing to their high incidence and mortality rates. Despite the ongoing advancements in detection, prevention, treatment, and prognosis of CVD, which have resulted in a decline in both incidence and mortality rates, CVD remains a major public health concern. Therefore, novel diagnostic biomarkers and therapeutic interventions are imperative to minimise the risk of CVD. Non-coding RNAs (ncRNAs) have recently gained increasing attention, with PIWI-interacting RNAs (piRNAs) emerging as a class of small ncRNAs traditionally recognised for their role in silencing transposons within cells. Although the functional roles of PIWI proteins and piRNAs in human cells remain unclear, growing evidence suggests that these molecules are gradually becoming valuable biomarkers for the diagnosis and treatment of CVD. This review provides a comprehensive summary of the latest studies on piRNAs in CVD. This review discusses the roles of piRNAs in various cardiovascular subtypes, including myocardial hypertrophy, heart failure, myocardial infarction, and cardiac regeneration. The perceived insights may contribute novel perspectives for the diagnosis and treatment of CVD.
Collapse
Affiliation(s)
- Zhihua Liu
- School of Basic Medical Sciences, Center for Precision Medicine, Kunming YanAn Hospital & Kunming University of Science and Technology, Kunming, China; Department of Biostatistics and Computational Biology, Bayer HealthCare, Harvard University, Boston, MA, USA.
| | - Xi Zhao
- School of Basic Medical Sciences, Center for Precision Medicine, Kunming YanAn Hospital & Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
3
|
Guo Y, Zhu K, Zhang Y, Tang H. Complete chloroplast genome of Artabotrys hexapetalus (L.f.) Bhandari 1965 (Annonaceae). Mitochondrial DNA B Resour 2024; 9:119-122. [PMID: 38259355 PMCID: PMC10802805 DOI: 10.1080/23802359.2024.2306202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 01/11/2024] [Indexed: 01/24/2024] Open
Abstract
Artabotrys hexapetalus (L.f.) Bhandari, 1965, an evergreen climbing shrub of significant value, is prominent in Chinese history and culture. The whole-gene sequencing of its chloroplast genome using Illumina pair-end sequencing data is conducted during this research. The complete chloroplast genome was determined to be 178,457 bp in size, separated by a large single copy (LSC) and a small single copy (SSC) region of 90,803 and 3,066 bp, respectively. A total of 134 genes were identified, including 90 protein-coding genes, 36 tRNA, and eight rRNA genes. Phylogenetic analysis revealed a close relationship between A. hexapetalus and Artabotrys pilosus, forming a sister branch with 100% support. The study suggests that the chloroplast genome of A. hexapetalus provides valuable insights into its evolutionary history and will contribute to the conservation efforts of this species.
Collapse
Affiliation(s)
- Yi Guo
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, China
| | - Kai Zhu
- Faculty of Innovation and Design, City University of Macau, Macau, China
- School of Art & Design, Guangdong University of Technology, Guangzhou, China
| | - Yongle Zhang
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, China
| | - Hui Tang
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, China
| |
Collapse
|
4
|
Qu XJ, Zou D, Zhang RY, Stull GW, Yi TS. Progress, challenge and prospect of plant plastome annotation. FRONTIERS IN PLANT SCIENCE 2023; 14:1166140. [PMID: 37324662 PMCID: PMC10266425 DOI: 10.3389/fpls.2023.1166140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/02/2023] [Indexed: 06/17/2023]
Abstract
The plastome (plastid genome) represents an indispensable molecular data source for studying phylogeny and evolution in plants. Although the plastome size is much smaller than that of nuclear genome, and multiple plastome annotation tools have been specifically developed, accurate annotation of plastomes is still a challenging task. Different plastome annotation tools apply different principles and workflows, and annotation errors frequently occur in published plastomes and those issued in GenBank. It is therefore timely to compare available annotation tools and establish standards for plastome annotation. In this review, we review the basic characteristics of plastomes, trends in the publication of new plastomes, the annotation principles and application of major plastome annotation tools, and common errors in plastome annotation. We propose possible methods to judge pseudogenes and RNA-editing genes, jointly consider sequence similarity, customed algorithms, conserved domain or protein structure. We also propose the necessity of establishing a database of reference plastomes with standardized annotations, and put forward a set of quantitative standards for evaluating plastome annotation quality for the scientific community. In addition, we discuss how to generate standardized GenBank annotation flatfiles for submission and downstream analysis. Finally, we prospect future technologies for plastome annotation integrating plastome annotation approaches with diverse evidences and algorithms of nuclear genome annotation tools. This review will help researchers more efficiently use available tools to achieve high-quality plastome annotation, and promote the process of standardized annotation of the plastome.
Collapse
Affiliation(s)
- Xiao-Jian Qu
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji’nan, Shandong, China
| | - Dan Zou
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji’nan, Shandong, China
| | - Rui-Yu Zhang
- Shandong Provincial Key Laboratory of Plant Stress Research, College of Life Sciences, Shandong Normal University, Ji’nan, Shandong, China
| | - Gregory W. Stull
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ting-Shuang Yi
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
5
|
Turudić A, Liber Z, Grdiša M, Jakše J, Varga F, Šatović Z. Chloroplast Genome Annotation Tools: Prolegomena to the Identification of Inverted Repeats. Int J Mol Sci 2022; 23:10804. [PMID: 36142721 PMCID: PMC9503105 DOI: 10.3390/ijms231810804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/01/2022] [Accepted: 09/13/2022] [Indexed: 12/31/2022] Open
Abstract
The development of next-generation sequencing technology and the increasing amount of sequencing data have brought the bioinformatic tools used in genome assembly into focus. The final step of the process is genome annotation, which works on assembled genome sequences to identify the location of genome features. In the case of organelle genomes, specialized annotation tools are used to identify organelle genes and structural features. Numerous annotation tools target chloroplast sequences. Most chloroplast DNA genomes have a quadripartite structure caused by two copies of a large inverted repeat. We investigated the strategies of six annotation tools (Chloë, Chloroplot, GeSeq, ORG.Annotate, PGA, Plann) for identifying inverted repeats and analyzed their success using publicly available complete chloroplast sequences of taxa belonging to the asterid and rosid clades. The annotation tools use two different approaches to identify inverted repeats, using existing general search tools or implementing stand-alone solutions. The chloroplast sequences studied show that there are different types of imperfections in the assembled data and that each tool performs better on some sequences than the others.
Collapse
Affiliation(s)
- Ante Turudić
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Zlatko Liber
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Science, University of Zagreb, Marulićev trg 9a, 10000 Zagreb, Croatia
| | - Martina Grdiša
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Jernej Jakše
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
| | - Filip Varga
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| | - Zlatko Šatović
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CoE CroP-BioDiv), Svetošimunska cesta 25, 10000 Zagreb, Croatia
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000 Zagreb, Croatia
| |
Collapse
|
6
|
Ananda G, Norton S, Blomstedt C, Furtado A, Møller B, Gleadow R, Henry R. Phylogenetic relationships in the Sorghum genus based on sequencing of the chloroplast and nuclear genes. THE PLANT GENOME 2021; 14:e20123. [PMID: 34323394 DOI: 10.1002/tpg2.20123] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 05/27/2021] [Indexed: 06/13/2023]
Abstract
Sorghum [Sorghum bicolor (L.) Moench] is an important food crop with a diverse gene pool residing in its wild relatives. A total of 15 sorghum accessions from the unexploited wild gene pool of the Sorghum genus, representing the five subgenera, were sequenced, and the complete chloroplast genomes and 99 common single-copy concatenated nuclear genes were assembled. Annotation of the chloroplast genomes identified a total of 81 protein-coding genes, 38 tRNA, and four rRNA genes. The gene content and gene order among the species was identical. A total of 153 nonsynonymous amino acid changes in 40 genes were identified across the species. Phylogenetic analysis of both the whole chloroplast genome and nuclear genes revealed a similar topology with two distinct clades within the genus. The species within the subgenera Eusorghum, Chaetosorghum, and Heterosorghum clustered in one clade, whereas the species within the subgenera Parasorghum and Stiposorghum clustered in a second clade. However, the subgenera Parasorghum and Stiposorghum were not monophyletic, suggesting the need for further research to resolve the relationships within this group. The close relationship between the two monotypic subgenera Chaetosorghum and Heterosorghum suggests that species within these subgenera could be considered as one group. This analysis provides an improved understanding of the genetic relationships within the Sorghum genus and defines diversity in wild sorghum species that may be useful for crop improvement.
Collapse
Affiliation(s)
- Galaihalage Ananda
- Queensland Alliance for Agriculture and Food Innovation, The Univ. of Queensland, St Lucia, QLD, Australia
| | - Sally Norton
- Australian Grains Genebank, Agriculture Victoria, Horsham, VIC, Australia
| | - Cecilia Blomstedt
- School of Biological Sciences, Monash Univ., Clayton, VIC, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The Univ. of Queensland, St Lucia, QLD, Australia
| | - Birger Møller
- Plant Biochemistry Laboratory, Dep. of Plant and Environmental Sciences, Univ. of Copenhagen, Copenhagen, Denmark
| | - Roslyn Gleadow
- Queensland Alliance for Agriculture and Food Innovation, The Univ. of Queensland, St Lucia, QLD, Australia
- School of Biological Sciences, Monash Univ., Clayton, VIC, Australia
| | - Robert Henry
- Queensland Alliance for Agriculture and Food Innovation, The Univ. of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
7
|
Liao Y, Fu J, Gao B, Tang T. The complete mitochondrial DNA genome of a cone snail, Conus betulinus (Neogastropoda: Conidae), from the South China sea. Mitochondrial DNA B Resour 2021; 6:1696-1698. [PMID: 34104742 PMCID: PMC8143595 DOI: 10.1080/23802359.2021.1930212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
The complete mitochondrial genome of the tubular cone snail Conus betulinus is presented in this study. The C. betulinus mitochondrial genome was 16,240 bp with 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (rRNA) genes, and a non-coding AT-rich region (D-loop). The overall base composition was estimated to be 25.67% for A, 38.26% for T, 21.38% for G, and 14.69% for C, with a high A + T content of 63.93%. Phylogenetic analyses based on 13 PCGs showed the close relationship of vermivorous C. betulinus with the common ancestor of molluscivorous Conus textile and Conus gloriamaris, providing a basis for further studies on the phylogenetics of cone snails according to their dietary type.
Collapse
Affiliation(s)
- Yanling Liao
- Key Laboratory of Tropical Translational Medicine of the Ministry of Education, Hainan Medical University, Haikou, China
| | - Jinxing Fu
- Key Laboratory of Tropical Translational Medicine of the Ministry of Education, Hainan Medical University, Haikou, China
| | - Bingmiao Gao
- Key Laboratory of Tropical Translational Medicine of the Ministry of Education, Hainan Medical University, Haikou, China
| | - Tianle Tang
- Key Laboratory of Tropical Translational Medicine of the Ministry of Education, Hainan Medical University, Haikou, China
| |
Collapse
|
8
|
Yang L, Feng C, Cai MM, Chen JH, Ding P. Complete chloroplast genome sequence of Amomum villosum and comparative analysis with other Zingiberaceae plants. CHINESE HERBAL MEDICINES 2020; 12:375-383. [PMID: 36120171 PMCID: PMC9476707 DOI: 10.1016/j.chmed.2020.05.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 05/24/2020] [Accepted: 05/31/2020] [Indexed: 10/28/2022] Open
|
9
|
Gruenstaeudl M, Jenke N. PACVr: plastome assembly coverage visualization in R. BMC Bioinformatics 2020; 21:207. [PMID: 32448146 PMCID: PMC7245912 DOI: 10.1186/s12859-020-3475-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 03/31/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plastid genomes typically display a circular, quadripartite structure with two inverted repeat regions, which challenges automatic assembly procedures. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on genome structure and evolution. The average coverage depth of a genome assembly is often used as an indicator of assembly quality. Visualizing coverage depth across a draft genome is a critical step, which allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Despite the interplay between genome structure and assembly quality, no contemporary, user-friendly software tool can visualize the coverage depth of a plastid genome assembly while taking its quadripartite genome structure into account. A software tool is needed that fills this void. RESULTS We introduce 'PACVr', an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as the individual plastome genes. By using a variable window approach, the tool allows visualizations on different calculation scales. It also confirms sequence equality of, as well as visualizes gene synteny between, the inverted repeat regions of the input genome. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be invoked from a Unix shell, facilitating its use in automated quality control. We illustrate the application of PACVr on four empirical datasets and compare visualizations generated by PACVr with those of alternative software tools. CONCLUSIONS PACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) gene synteny across the inverted repeat regions. It contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences. The software, example datasets, technical documentation, and a tutorial are available with the package at https://cran.r-project.org/package=PACVr.
Collapse
Affiliation(s)
- Michael Gruenstaeudl
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195 Germany
| | - Nils Jenke
- Institut für Bioinformatik, Freie Universität Berlin, Berlin, 14195 Germany
| |
Collapse
|
10
|
The chloroplast genome sequence of the green macroalga Caulerpa okamurae (Ulvophyceae, Chlorophyta): Its structural features, organization and phylogenetic analysis. Mar Genomics 2020; 53:100752. [PMID: 32014385 DOI: 10.1016/j.margen.2020.100752] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 12/05/2019] [Accepted: 01/24/2020] [Indexed: 11/20/2022]
Abstract
To clarify evolutionary characteristics, phylogenetic relationships as well as species identification of C. okamurae, we determined the cpDNA sequence of Caulerpa okamurae using de novo sequencing in the present study. The cpDNA of C. okamurae was 148,274 bp in length, and it lacked the inverted repeat commonly found in vascular green plants. The cpDNA of C. okamurae was highly compact with a gene density of 71.7%. Moreover, it was an AT-rich genome (65.5%) consisting 76 protein-coding genes (PCGs), 27 transfer RNA (tRNA) genes, three ribosomal RNA (rRNA) genes, 32 putative open reading frames (ORFs) and six introns. Additionally, the six introns were annotated in six genes as follows: psbA, rpoB, ftsH, psbD, atpF and cysA. The overall base composition of its cpDNA was 65.46% for AT. A total of 56 genes were encoded on the light strand, while all the other 50 chloroplast genes were encoded on the heavy strand. All of the PCGs had ATG as their start codon and employed TAA, TGA or TAG as their termination codon. Phylogenetic analyses suggested that the complete cpDNA sequence of C. okamurae fell in the Chlorophyta, Ulvophyceae, Bryopsidales, and Caulerpaceae and more resembled the cpDNAs of C. racemosa, C. cliftonii voucher and Tydemania expeditionis. Taken together, our data offered useful information for the studies of C.okamurae on evolutionary characteristics, phylogenetic relationships as well as species identification.
Collapse
|
11
|
Konhar R, Debnath M, Vishwakarma S, Bhattacharjee A, Sundar D, Tandon P, Dash D, Biswal DK. The complete chloroplast genome of Dendrobium nobile, an endangered medicinal orchid from north-east India and its comparison with related Dendrobium species. PeerJ 2019; 7:e7756. [PMID: 31695964 PMCID: PMC6830405 DOI: 10.7717/peerj.7756] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Accepted: 08/26/2019] [Indexed: 11/21/2022] Open
Abstract
The medicinal orchid genus Dendrobium belonging to the Orchidaceae family is a huge genus comprising about 800-1,500 species. To better illustrate the species status in the genus Dendrobium, a comparative analysis of 33 available chloroplast genomes retrieved from NCBI RefSeq database was compared with that of the first complete chloroplast genome of D. nobile from north-east India based on next-generation sequencing methods (Illumina HiSeq 2500-PE150). Our results provide comparative chloroplast genomic information for taxonomical identification, alignment-free phylogenomic inference and other statistical features of Dendrobium plastomes, which can also provide valuable information on their mutational events and sequence divergence.
Collapse
Affiliation(s)
- Ruchishree Konhar
- Bioinformatics Centre, North-Eastern Hill University, Shillong, Meghalaya, India
- Informatics and Big Data, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research, Ghaziabad, Uttar Pradesh, India
| | - Manish Debnath
- Bioinformatics Centre, North-Eastern Hill University, Shillong, Meghalaya, India
| | - Santosh Vishwakarma
- Bioinformatics Centre, North-Eastern Hill University, Shillong, Meghalaya, India
| | - Atanu Bhattacharjee
- Bioinformatics Centre, North-Eastern Hill University, Shillong, Meghalaya, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi, India
| | - Pramod Tandon
- Biotech Park, Kursi road, Lucknow, Uttar Pradesh, India
| | - Debasis Dash
- Informatics and Big Data, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research, Ghaziabad, Uttar Pradesh, India
| | | |
Collapse
|
12
|
Guyeux C, Charr JC, Tran HTM, Furtado A, Henry RJ, Crouzillat D, Guyot R, Hamon P. Evaluation of chloroplast genome annotation tools and application to analysis of the evolution of coffee species. PLoS One 2019; 14:e0216347. [PMID: 31188829 PMCID: PMC6561552 DOI: 10.1371/journal.pone.0216347] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 04/18/2019] [Indexed: 12/13/2022] Open
Abstract
Chloroplast sequences are widely used for phylogenetic analysis due to their high degree of conservation in plants. Whole chloroplast genomes can now be readily obtained for plant species using new sequencing methods, giving invaluable data for plant evolution However new annotation methods are required for the efficient analysis of this data to deliver high quality phylogenetic analyses. In this study, the two main tools for chloroplast genome annotation were compared. More consistent detection and annotation of genes were produced with GeSeq when compared to the currently used Dogma. This suggests that the annotation of most of the previously annotated chloroplast genomes should now be updated. GeSeq was applied to species related to coffee, including 16 species of the Coffea and Psilanthus genera to reconstruct the ancestral chloroplast genomes and to evaluate their phylogenetic relationships. Eight genes in the plant chloroplast pan genome (consisting of 92 genes) were always absent in the coffee species analyzed. Notably, the two main cultivated coffee species (i.e. Arabica and Robusta) did not group into the same clade and differ in their pattern of gene evolution. While Arabica coffee (Coffea arabica) belongs to the Coffea genus, Robusta coffee (Coffea canephora) is associated with the Psilanthus genus. A more extensive survey of related species is required to determine if this is a unique attribute of Robusta coffee or a more widespread feature of coffee tree species.
Collapse
Affiliation(s)
- Christophe Guyeux
- Femto-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, Besançon, France
| | - Jean-Claude Charr
- Femto-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, Besançon, France
| | - Hue T. M. Tran
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, Australia
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, Australia
| | | | - Romain Guyot
- Institut de Recherche pour le Développement, UMR IPME, CIRAD, Université de Montpellier, Montpellier, France
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Perla Hamon
- Institut de Recherche pour le Développement, UMR DIADE, Université de Montpellier, Montpellier, France
| |
Collapse
|
13
|
Zhao Y, Xue X, Xie X. An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison. Comput Biol Chem 2019; 80:10-15. [PMID: 30851619 DOI: 10.1016/j.compbiolchem.2019.01.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Revised: 12/30/2018] [Accepted: 01/17/2019] [Indexed: 01/21/2023]
Abstract
Sequence comparison is an important topic in bioinformatics. With the exponential increase of biological sequences, the traditional protein sequence comparison methods - the alignment methods become limited, so the alignment-free methods are widely proposed in the past two decades. In this paper, we considered not only the six typical physicochemical properties of amino acids, but also their frequency and positional distribution. A 51-dimensional vector was obtained to describe the protein sequence. We got a pairwise distance matrix by computing the standardized Euclidean distance, and discriminant analysis and phylogenetic analysis can be made. The results on the Influenza A virus and ND5 datasets indicate that our method is accurate and efficient for classifying proteins and inferring the phylogeny of species.
Collapse
Affiliation(s)
- Yunxiu Zhao
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xiaolong Xue
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| | - Xiaoli Xie
- College of Science, Northwest A&F University, Yangling, Shaanxi 712100, PR China.
| |
Collapse
|
14
|
Han GB, Cho DH. Genome classification improvements based on k-mer intervals in sequences. Genomics 2018; 111:1574-1582. [PMID: 30439480 DOI: 10.1016/j.ygeno.2018.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 10/13/2018] [Accepted: 11/05/2018] [Indexed: 10/27/2022]
Abstract
Given the vast amount of genomic data, alignment-free sequence comparison methods are required due to their low computational complexity. k-mer based methods can improve comparison accuracy by extracting an effective feature of the genome sequences. The aim of this paper is to extract k-mer intervals of a sequence as a feature of a genome for high comparison accuracy. In the proposed method, we calculated the distance between genome sequences by comparing the distribution of k-mer intervals. Then, we identified the classification results using phylogenetic trees. We used viral, mitochondrial (MT), microbial and mammalian genome sequences to perform classification for various genome sets. We confirmed that the proposed method provides a better classification result than other k-mer based methods. Furthermore, the proposed method could efficiently be applied to long sequences such as human and mouse genomes.
Collapse
Affiliation(s)
- Gyu-Bum Han
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Dong-Ho Cho
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
| |
Collapse
|
15
|
Gao B, Peng C, Chen Q, Zhang J, Shi Q. Mitochondrial genome sequencing of a vermivorous cone snail Conus quercinus supports the correlative analysis between phylogenetic relationships and dietary types of Conus species. PLoS One 2018; 13:e0193053. [PMID: 30059499 PMCID: PMC6066214 DOI: 10.1371/journal.pone.0193053] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 02/02/2018] [Indexed: 12/15/2022] Open
Abstract
Complete mitochondrial genome (mitogenome) sequence of a worm-hunting cone snail, Conus quercinus, was reported in this study. Its mitogenome, the longest one (16,460 bp) among reported Conus specie, is composed of 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and one D-loop region. The mitochondrial gene arrangement is highly-conserved and identical to other reported. However, the D-loop region of C. quercinus is the longest (943 bp) with the higher A+T content (71.3%) and a long AT tandem repeat stretch (68 bp). Subsequent phylogenetic analysis demonstrated that three different dietary types (vermivorous, molluscivorous and piscivorous) of cone snails are clustered separately, suggesting that the phylogenetics of cone snails is related to their dietary types. In conclusion, our current work improves our understanding of the mitogenomic structure and evolutionary status of the vermivorous C. quercinus, which support the putative hypothesis that the Conus ancestor was vermivorous.
Collapse
Affiliation(s)
- Bingmiao Gao
- Hainan Provincial Key Laboratory of Research and Development of Tropical Medicinal Plants, Hainan Medical University, Haikou, China
| | - Chao Peng
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen, China
| | - Qin Chen
- School of Agricultural and Forestry Science and Technology, Hainan Radio & TV University, Haikou, China
| | - Junqing Zhang
- Hainan Provincial Key Laboratory of Research and Development of Tropical Medicinal Plants, Hainan Medical University, Haikou, China
| | - Qiong Shi
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen, China
| |
Collapse
|
16
|
He L, Wang Z, Lou S, Lin X, Hu F. The complete chloroplast genome of the green algae Hariotina reticulata (Scenedesmaceae, Sphaeropleales, Chlorophyta). Genes Genomics 2018; 40:543-552. [DOI: 10.1007/s13258-018-0652-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 12/26/2017] [Indexed: 10/18/2022]
|
17
|
Abstract
With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms.
Collapse
|
18
|
Wang ZK, He LJ, Hu F, Lin XZ. Characterization of the complete mitochondrial genome of Coelastrum_sp.F187. MITOCHONDRIAL DNA PART B-RESOURCES 2017; 2:455-456. [PMID: 33490456 PMCID: PMC7800356 DOI: 10.1080/23802359.2017.1357440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Coelastrum is a genus of green algae that belongs to the Scenedesmaceae family. There is little information available about this genus. A phylogenetic analysis of the ITS2 sequences showed that Coelastrum is a paraphyletic group. To better explore the phylogenetic status of this genus, we report the mitochondrial genome sequence of Coelastrum sp. F187 using next-generation sequencing technology. The complete mitochondrial genome is 52,888 bp in size and encodes 43 conventional mitochondrial genes, including 14 protein-coding genes (PCGs), 24 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Most of the PCGs (12/14) and all tRNAs were located in the heavy chain and the light chain, respectively. The phylogenetic analysis based on the complete mitochondrial genome sequences indicated that Coelastrum is closely related to Pectinodesmus pectinatus. The sequenced complete mitochondrial genome of Coelastrum sp. F187 provides fundamental molecular data that will be useful for species identification, population genetics, and evolutionary relationships.
Collapse
Affiliation(s)
- Zhao Kai Wang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, Fujian, P.R. China
| | - Li Juan He
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, Fujian, P.R. China
| | - Fan Hu
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, Fujian, P.R. China
| | - Xiang Zhi Lin
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, Fujian, P.R. China
| |
Collapse
|
19
|
Chloroplast Genome Sequence Annotation of <i>Dendrobium nobile</i> (Asparagales: Orchidaceae), an Endangered Medicinal Orchid from Northeast India. PLOS CURRENTS 2017; 9. [PMID: 28736679 PMCID: PMC5501700 DOI: 10.1371/currents.tol.cf1709613759c2223eb582c0fa694cc7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Orchidaceae constitutes one of the largest families of angiosperms. Owing to the significance of orchids in plant biology, market needs and current sustainable technology levels, basic research on the biology of orchids and their applications in the orchid industry is increasing. Although chloroplast (cp) genomes continue to be evolutionarily informative, there is very limited information available on orchid chloroplast genomes in public repositories. Here, we report the complete cp genome sequence of Dendrobium nobile from Northeast India (Orchidaceae, Asparagales), bearing the GenBank accession number KX377961, which will provide valuable information for future research on orchid genomics and evolution, as well as the medicinal value of orchids. Phylogenetic analyses using Bayesian methods recovered a monophyletic grouping of all Dendrobium species (D. nobile, D. huoshanense, D. officinale, D. pendulum, D. strongylanthum and D. chrysotoxum). The relationships recovered among the representative orchid species from the four subfamilies, i.e., Cypripedioideae, Epidendroideae, Orchidoideae and Vanilloideae, were consistent within the family Orchidaceae.
Collapse
|
20
|
Chen L, Zhang YH, Lu G, Huang T, Cai YD. Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif Intell Med 2017; 76:27-36. [PMID: 28363286 DOI: 10.1016/j.artmed.2017.02.001] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 01/31/2017] [Accepted: 02/05/2017] [Indexed: 12/17/2022]
Abstract
BACKGROUND Cancer is a disease that involves abnormal cell growth and can invade or metastasize to other tissues. It is known that several factors are related to its initiation, proliferation, and invasiveness. Recently, it has been reported that long non-coding RNAs (lncRNAs) can participate in specific functional pathways and further regulate the biological function of cancer cells. Studies on lncRNAs are therefore helpful for uncovering the underlying mechanisms of cancer biological processes. METHODS We investigated cancer-related lncRNAs using gene ontology (GO) terms and KEGG pathway enrichment scores of neighboring genes that are co-expressed with the lncRNAs by extracting important GO terms and KEGG pathways that can help us identify cancer-related lncRNAs. The enrichment theory of GO terms and KEGG pathways was adopted to encode each lncRNA. Then, feature selection methods were employed to analyze these features and obtain the key GO terms and KEGG pathways. RESULTS The analysis indicated that the extracted GO terms and KEGG pathways are closely related to several cancer associated processes, such as hormone associated pathways, energy associated pathways, and ribosome associated pathways. And they can accurately predict cancer-related lncRNAs. CONCLUSIONS This study provided novel insight of how lncRNAs may affect tumorigenesis and which pathways may play important roles during it. These results could help understanding the biological mechanisms of lncRNAs and treating cancer.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China; College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, People's Republic of China.
| | - Guohui Lu
- Department of Neurosurgery, The First Affiliated Hospital of Nanchang University, Nanchang 330006, People's Republic of China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200025, People's Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.
| |
Collapse
|
21
|
Complete Chloroplast Genome Sequence of Dendrobium nobile from Northeastern India. GENOME ANNOUNCEMENTS 2016; 4:4/5/e01088-16. [PMID: 27795255 PMCID: PMC5054326 DOI: 10.1128/genomea.01088-16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The orchid species Dendrobium nobile belonging to the family Orchidaceae and genus Dendrobium (a vast genus that encompasses nearly 1,200 species) has an herbal medicinal history of about 2000 years in east and south Asian countries. Here, we report the complete chloroplast genome sequence of D. nobile from northeastern India for the first time.
Collapse
|
22
|
Wang Z, Lou S, Hu F, Wu P, Yang L, Li H, He L, Lin X. Complete mitochondrial genome of a DHA-rich protist Schizochytrium sp. TIO1101. MITOCHONDRIAL DNA PART B-RESOURCES 2016; 1:126-127. [PMID: 33473432 PMCID: PMC7799850 DOI: 10.1080/23802359.2016.1144090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Schizochytrium sp. TIO1101 is a crucial commercial alga used to produce docosahexaenoic acid (DHA), a long-chain polyunsaturated fatty acid that is beneficial for human health. In this study, we sequenced the mitochondrial genome (mitogenome) of Schizochytrium sp. TIO1101 for the first time using an Illumina HiSeq 2500 system (Illumina Inc., San Deigo, CA). The assembled mitogenome was 31 494 bp long with 33.92% GC content. The mitogenome contains 56 genes, including 33 protein-coding genes, 21 transfer RNA genes and two ribosomal RNA genes. Maximum-likelihood phylogenetic analysis of Schizochytrium sp. TIO1101 showed that it was most closely related to Thraustochytrium aureum among the examined species.
Collapse
Affiliation(s)
- Zhaokai Wang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Sulin Lou
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Fan Hu
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Peng Wu
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Longhe Yang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Huanqin Li
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Lijuan He
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Xiangzhi Lin
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| |
Collapse
|
23
|
He L, Lou S, Zhang F, Yang S, Zhang C, Lin X, Yang L. The complete mitochondrial DNA sequence of the green algae Hariotina sp. F30 (Scenedesmaceae, Sphaeropleales, Chlorophyceae). Mitochondrial DNA B Resour 2016; 1:124-125. [PMID: 33473431 PMCID: PMC7800414 DOI: 10.1080/23802359.2016.1144089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 01/07/2016] [Accepted: 01/16/2016] [Indexed: 10/31/2022] Open
Abstract
The complete mitochondrial genome of the green algae Hariotina sp. F30 was obtained in this study using Illumina sequencing data. It is 51 915 bp in length with 36.23% GC content. The genome contains 13 protein-coding genes, 23 tRNA genes and six rRNA genes, all of which are encoded on the heavy strand. AUG is a universal initiation codon among 13 protein-coding genes. UCA is a universal termination codon for most protein-coding gens except UAA in cox1 and cob genes and UGA in nad6 gene. CUU anticodon for tRNA-Lys was detected for the first time in Sphaeropleales.
Collapse
Affiliation(s)
- Lijuan He
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Sulin Lou
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Fang Zhang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Shanjun Yang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Chao Zhang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Xiangzhi Lin
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| | - Longhe Yang
- Engineering Research Center of Marine Biological Resource Comprehensive Utilization, Third Institute of Oceanography, State Oceanic Administration, Xiamen, China
| |
Collapse
|
24
|
Wang Y, Zhan DF, Jia X, Mei WL, Dai HF, Chen XT, Peng SQ. Complete Chloroplast Genome Sequence of Aquilaria sinensis (Lour.) Gilg and Evolution Analysis within the Malvales Order. FRONTIERS IN PLANT SCIENCE 2016; 7:280. [PMID: 27014304 PMCID: PMC4781844 DOI: 10.3389/fpls.2016.00280] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 02/21/2016] [Indexed: 05/11/2023]
Abstract
Aquilaria sinensis (Lour.) Gilg is an important medicinal woody plant producing agarwood, which is widely used in traditional Chinese medicine. High-throughput sequencing of chloroplast (cp) genomes enhanced the understanding about evolutionary relationships within plant families. In this study, we determined the complete cp genome sequences for A. sinensis. The size of the A. sinensis cp genome was 159,565 bp. This genome included a large single-copy region of 87,482 bp, a small single-copy region of 19,857 bp, and a pair of inverted repeats (IRa and IRb) of 26,113 bp each. The GC content of the genome was 37.11%. The A. sinensis cp genome encoded 113 functional genes, including 82 protein-coding genes, 27 tRNA genes, and 4 rRNA genes. Seven genes were duplicated in the protein-coding genes, whereas 11 genes were duplicated in the RNA genes. A total of 45 polymorphic simple-sequence repeat loci and 60 pairs of large repeats were identified. Most simple-sequence repeats were located in the noncoding sections of the large single-copy/small single-copy region and exhibited high A/T content. Moreover, 33 pairs of large repeat sequences were located in the protein-coding genes, whereas 27 pairs were located in the intergenic regions. Aquilaria sinensis cp genome bias ended with A/T on the basis of codon usage. The distribution of codon usage in A. sinensis cp genome was most similar to that in the Gonystylus bancanus cp genome. Comparative results of 82 protein-coding genes from 29 species of cp genomes demonstrated that A. sinensis was a sister species to G. bancanus within the Malvales order. Aquilaria sinensis cp genome presented the highest sequence similarity of >90% with the G. bancanus cp genome by using CGView Comparison Tool. This finding strongly supports the placement of A. sinensis as a sister to G. bancanus within the Malvales order. The complete A. sinensis cp genome information will be highly beneficial for further studies on this traditional medicinal plant. Moreover, the results will enhance our understanding about the evolution of cp genomes of the Malvales order, particularly with regard to the role of A. sinensis in plant systematics and evolution.
Collapse
Affiliation(s)
- Ying Wang
- Key Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural SciencesHaikou, China
| | - Di-Feng Zhan
- College of Agronomy, Hainan UniversityHaikou, China
| | - Xian Jia
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen UniversityXiamen, China
| | - Wen-Li Mei
- Key Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural SciencesHaikou, China
| | - Hao-Fu Dai
- Key Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural SciencesHaikou, China
| | - Xiong-Ting Chen
- Key Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural SciencesHaikou, China
- *Correspondence: Xiong-Ting Chen
| | - Shi-Qing Peng
- Key Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural SciencesHaikou, China
- Shi-Qing Peng
| |
Collapse
|
25
|
A novel k-word relative measure for sequence comparison. Comput Biol Chem 2014; 53PB:331-338. [PMID: 25462340 DOI: 10.1016/j.compbiolchem.2014.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2014] [Revised: 08/10/2014] [Accepted: 10/25/2014] [Indexed: 12/28/2022]
Abstract
In order to extract phylogenetic information from DNA sequences, the new normalized k-word average relative distance is proposed in this paper. The proposed measure was tested by discriminate analysis and phylogenetic analysis. The phylogenetic trees based on the Manhattan distance measure are reconstructed with k ranging from 1 to 12. At the same time, a new method is suggested to reduce the matrix dimension, can greatly lessen the amount of calculation and operation time. The experimental assessment demonstrated that our measure was efficient. What's more, comparing with other methods' results shows that our method is feasible and powerful for phylogenetic analysis.
Collapse
|
26
|
Logacheva MD, Schelkunov MI, Nuraliev MS, Samigullin TH, Penin AA. The plastid genome of mycoheterotrophic monocot Petrosavia stellaris exhibits both gene losses and multiple rearrangements. Genome Biol Evol 2014; 6:238-46. [PMID: 24398375 PMCID: PMC3914687 DOI: 10.1093/gbe/evu001] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/30/2013] [Indexed: 12/31/2022] Open
Abstract
Plastid genomes of nonphotosynthetic plants represent a perfect model for studying evolution under relaxed selection pressure. However, the information on their sequences is still limited. We sequenced and assembled plastid genome of Petrosavia stellaris, a rare mycoheterotrophic monocot plant. After orchids, Petrosavia represents only the second family of nonphotosynthetic monocots to have its plastid genome examined. Several unusual features were found: retention of the ATP synthase genes and rbcL gene; extensive gene order rearrangement despite a relative lack of repeat sequences; an unusually short inverted repeat region that excludes most of the rDNA operon; and a lack of evidence for accelerated sequence evolution. Plastome of photosynthetic relative of P. stellaris, Japonolirion osense, has standard gene order and does not have the predisposition to inversions. Thus, the rearrangements in the P. stellaris plastome are the most likely associated with transition to heterotrophic way of life.
Collapse
Affiliation(s)
- Maria D. Logacheva
- M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Mikhail I. Schelkunov
- M.V. Lomonosov Moscow State University, Moscow, Russia
- V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Maxim S. Nuraliev
- M.V. Lomonosov Moscow State University, Moscow, Russia
- Joint Russian–Vietnamese Tropical Scientific and Technological Center, Cau Giay, Hanoi, Vietnam
| | | | - Aleksey A. Penin
- M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
27
|
Schwende I, Pham TD. Pattern recognition and probabilistic measures in alignment-free sequence analysis. Brief Bioinform 2013; 15:354-68. [PMID: 24096012 DOI: 10.1093/bib/bbt070] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient.
Collapse
Affiliation(s)
- Isabel Schwende
- PhD, Aizu Research Cluster for Medical Informatics and Engineering (ARC-Medical), Research Center for Advanced Information Science and Technology (CAIST), The University of Aizu, Aizuwakamatsu, Fukushima 965-8580, Japan.
| | | |
Collapse
|