1
|
Sandler G, Agrawal AF, Wright SI. Population Genomics of the Facultatively Sexual Liverwort Marchantia polymorpha. Genome Biol Evol 2023; 15:evad196. [PMID: 37883717 PMCID: PMC10667032 DOI: 10.1093/gbe/evad196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 10/15/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023] Open
Abstract
The population genomics of facultatively sexual organisms are understudied compared with their abundance across the tree of life. We explore patterns of genetic diversity in two subspecies of the facultatively sexual liverwort Marchantia polymorpha using samples from across Southern Ontario, Canada. Despite the ease with which M. polymorpha should be able to propagate asexually, we find no evidence of strictly clonal descent among our samples and little to no signal of isolation by distance. Patterns of identity-by-descent tract sharing further showed evidence of recent recombination and close relatedness between geographically distant isolates, suggesting long distance gene flow and at least a modest frequency of sexual reproduction. However, the M. polymorpha genome contains overall very low levels of nucleotide diversity and signs of inefficient selection evidenced by a relatively high fraction of segregating deleterious variants. We interpret these patterns as possible evidence of the action of linked selection and a small effective population size due to past generations of asexual propagation. Overall, the M. polymorpha genome harbors signals of a complex history of both sexual and asexual reproduction.
Collapse
Affiliation(s)
- George Sandler
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Aneil F Agrawal
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023; 15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
3
|
Shi X, Teng H, Sun Z. An updated overview of experimental and computational approaches to identify non-canonical DNA/RNA structures with emphasis on G-quadruplexes and R-loops. Brief Bioinform 2022; 23:6751149. [PMID: 36208174 PMCID: PMC9677470 DOI: 10.1093/bib/bbac441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 08/22/2022] [Accepted: 09/13/2022] [Indexed: 12/14/2022] Open
Abstract
Multiple types of non-canonical nucleic acid structures play essential roles in DNA recombination and replication, transcription, and genomic instability and have been associated with several human diseases. Thus, an increasing number of experimental and bioinformatics methods have been developed to identify these structures. To date, most reviews have focused on the features of non-canonical DNA/RNA structure formation, experimental approaches to mapping these structures, and the association of these structures with diseases. In addition, two reviews of computational algorithms for the prediction of non-canonical nucleic acid structures have been published. One of these reviews focused only on computational approaches for G4 detection until 2020. The other mainly summarized the computational tools for predicting cruciform, H-DNA and Z-DNA, in which the algorithms discussed were published before 2012. Since then, several experimental and computational methods have been developed. However, a systematic review including the conformation, sequencing mapping methods and computational prediction strategies for these structures has not yet been published. The purpose of this review is to provide an updated overview of conformation, current sequencing technologies and computational identification methods for non-canonical nucleic acid structures, as well as their strengths and weaknesses. We expect that this review will aid in understanding how these structures are characterised and how they contribute to related biological processes and diseases.
Collapse
Affiliation(s)
- Xiaohui Shi
- Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The first Affiliated Hospital of WMU; Beijing Institutes of Life Science, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Ouhai District, Wenzhou 325000, China
| | - Huajing Teng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education) at Peking University Cancer Hospital and Institute, Ouhai District, Wenzhou 325000, China
| | - Zhongsheng Sun
- Corresponding author: Zhongsheng Sun, Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The 1st Affiliated Hospital of WMU, Nanbaixiang Wenyi Yiyuan Xinyuan District, Ouhai District, Wenzhou 325000, China. E-mail:
| |
Collapse
|
4
|
Bhardwaj V, Yadav D, Dhankhar M, Saini K. A novel approach for identification of mirror repeats within the Engrailed Homeobox-1 gene of Xenopus tropicalis. BIOMEDICAL AND BIOTECHNOLOGY RESEARCH JOURNAL (BBRJ) 2022. [DOI: 10.4103/bbrj.bbrj_281_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
5
|
López-Cortegano E, Craig RJ, Chebib J, Samuels T, Morgan AD, Kraemer SA, Böndel KB, Ness RW, Colegrave N, Keightley PD. De Novo Mutation Rate Variation and Its Determinants in Chlamydomonas. Mol Biol Evol 2021; 38:3709-3723. [PMID: 33950243 PMCID: PMC8383909 DOI: 10.1093/molbev/msab140] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
De novo mutations are central for evolution, since they provide the raw material for natural selection by regenerating genetic variation. However, studying de novo mutations is challenging and is generally restricted to model species, so we have a limited understanding of the evolution of the mutation rate and spectrum between closely related species. Here, we present a mutation accumulation (MA) experiment to study de novo mutation in the unicellular green alga Chlamydomonas incerta and perform comparative analyses with its closest known relative, Chlamydomonas reinhardtii. Using whole-genome sequencing data, we estimate that the median single nucleotide mutation (SNM) rate in C. incerta is μ = 7.6 × 10-10, and is highly variable between MA lines, ranging from μ = 0.35 × 10-10 to μ = 131.7 × 10-10. The SNM rate is strongly positively correlated with the mutation rate for insertions and deletions between lines (r > 0.97). We infer that the genomic factors associated with variation in the mutation rate are similar to those in C. reinhardtii, allowing for cross-prediction between species. Among these genomic factors, sequence context and complexity are more important than GC content. With the exception of a remarkably high C→T bias, the SNM spectrum differs markedly between the two Chlamydomonas species. Our results suggest that similar genomic and biological characteristics may result in a similar mutation rate in the two species, whereas the SNM spectrum has more freedom to diverge.
Collapse
Affiliation(s)
- Eugenio López-Cortegano
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Rory J Craig
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jobran Chebib
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Toby Samuels
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew D Morgan
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Katharina B Böndel
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
| | - Rob W Ness
- Department of Biology, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Nick Colegrave
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
6
|
Lavezzo E, Berselli M, Frasson I, Perrone R, Palù G, Brazzale AR, Richter SN, Toppo S. G-quadruplex forming sequences in the genome of all known human viruses: A comprehensive guide. PLoS Comput Biol 2018; 14:e1006675. [PMID: 30543627 PMCID: PMC6307822 DOI: 10.1371/journal.pcbi.1006675] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 12/27/2018] [Accepted: 11/27/2018] [Indexed: 12/21/2022] Open
Abstract
G-quadruplexes are non-canonical nucleic-acid structures that control transcription, replication, and recombination in organisms. G-quadruplexes are present in eukaryotes, prokaryotes, and viruses. In the latter, mounting evidence indicates their key biological activity. Since data on viruses are scattered, we here present a comprehensive analysis of potential quadruplex-forming sequences (PQS) in the genome of all known viruses that can infect humans. We show that occurrence and location of PQSs are features characteristic of each virus class and family. Our statistical analysis proves that their presence within the viral genome is orderly arranged, as indicated by the possibility to correctly assign up to two-thirds of viruses to their exact class based on the PQS classification. For each virus we provide: i) the list of all PQS present in the genome (positive and negative strands), ii) their position in the viral genome, iii) the degree of conservation among strains of each PQS in its genome context, iv) the statistical significance of PQS abundance. This information is accessible from a database to allow the easy navigation of the results: http://www.medcomp.medicina.unipd.it/main_site/doku.php?id=g4virus. The availability of these data will greatly expedite research on G-quadruplex in viruses, with the possibility to accelerate finding therapeutic opportunities to numerous and some fearsome human diseases.
Collapse
Affiliation(s)
- Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Michele Berselli
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Ilaria Frasson
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Rosalba Perrone
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Giorgio Palù
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | | | - Sara N. Richter
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| |
Collapse
|