1
|
Shi X, Teng H, Sun Z. An updated overview of experimental and computational approaches to identify non-canonical DNA/RNA structures with emphasis on G-quadruplexes and R-loops. Brief Bioinform 2022; 23:6751149. [PMID: 36208174 PMCID: PMC9677470 DOI: 10.1093/bib/bbac441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 08/22/2022] [Accepted: 09/13/2022] [Indexed: 12/14/2022] Open
Abstract
Multiple types of non-canonical nucleic acid structures play essential roles in DNA recombination and replication, transcription, and genomic instability and have been associated with several human diseases. Thus, an increasing number of experimental and bioinformatics methods have been developed to identify these structures. To date, most reviews have focused on the features of non-canonical DNA/RNA structure formation, experimental approaches to mapping these structures, and the association of these structures with diseases. In addition, two reviews of computational algorithms for the prediction of non-canonical nucleic acid structures have been published. One of these reviews focused only on computational approaches for G4 detection until 2020. The other mainly summarized the computational tools for predicting cruciform, H-DNA and Z-DNA, in which the algorithms discussed were published before 2012. Since then, several experimental and computational methods have been developed. However, a systematic review including the conformation, sequencing mapping methods and computational prediction strategies for these structures has not yet been published. The purpose of this review is to provide an updated overview of conformation, current sequencing technologies and computational identification methods for non-canonical nucleic acid structures, as well as their strengths and weaknesses. We expect that this review will aid in understanding how these structures are characterised and how they contribute to related biological processes and diseases.
Collapse
Affiliation(s)
- Xiaohui Shi
- Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The first Affiliated Hospital of WMU; Beijing Institutes of Life Science, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Ouhai District, Wenzhou 325000, China
| | - Huajing Teng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education) at Peking University Cancer Hospital and Institute, Ouhai District, Wenzhou 325000, China
| | - Zhongsheng Sun
- Corresponding author: Zhongsheng Sun, Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The 1st Affiliated Hospital of WMU, Nanbaixiang Wenyi Yiyuan Xinyuan District, Ouhai District, Wenzhou 325000, China. E-mail:
| |
Collapse
|
2
|
Govorkova P, Candice Lam CK, Truong K. Design of Synthetic Mammalian Promoters Using Highly Palindromic Subsequences. ACS Synth Biol 2022; 11:1096-1105. [PMID: 35225601 DOI: 10.1021/acssynbio.1c00600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To express transgenes in specific cell types and states, promoters for endogenous genes are commonly created by truncating the sequence upstream of the transcriptional start site until the promoter is no longer functional. In this paper, we developed a method to design shorter synthetic mammalian promoters for endogenous genes by concatenating only its highly palindromic subsequences with a minimal core promoter. After developing metrics for palindromic density, analysis across all the human and mouse promoters showed higher palindromic density than expected by random. As experimental demonstrations, we applied the method to the CMV promoter (reduced to 432 nucleotides) and the mouse synapsin-1 promoter (383 nucleotides) to express fluorescent protein as reporters. Remarkably, the highly palindromic subsequences of these synthetic promoters contained sites important for strong constitutive expression and neuron-specific expression. As a resource to the community, we created enhancer sequences for all the human and mouse promoters.
Collapse
Affiliation(s)
- Polina Govorkova
- Edward S. Rogers, Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Circle, Toronto, Ontario M5S 3G4, Canada
| | - Chee Ka Candice Lam
- Institute of Biomedical Engineering, University of Toronto, 164 College Street, Toronto, Ontario M5S 3G9, Canada
| | - Kevin Truong
- Edward S. Rogers, Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Circle, Toronto, Ontario M5S 3G4, Canada
- Institute of Biomedical Engineering, University of Toronto, 164 College Street, Toronto, Ontario M5S 3G9, Canada
| |
Collapse
|
3
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
4
|
Jia L, Li Y, Huang F, Jiang Y, Li H, Wang Z, Chen T, Li J, Zhang Z, Yao W. LIRBase: a comprehensive database of long inverted repeats in eukaryotic genomes. Nucleic Acids Res 2021; 50:D174-D182. [PMID: 34643715 PMCID: PMC8728187 DOI: 10.1093/nar/gkab912] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 09/20/2021] [Accepted: 09/25/2021] [Indexed: 11/14/2022] Open
Abstract
Small RNAs (sRNAs) constitute a large portion of functional elements in eukaryotic genomes. Long inverted repeats (LIRs) can be transcribed into long hairpin RNAs (hpRNAs), which can further be processed into small interfering RNAs (siRNAs) with vital biological roles. In this study, we systematically identified a total of 6 619 473 LIRs in 424 eukaryotic genomes and developed LIRBase (https://venyao.xyz/lirbase/), a specialized database of LIRs across different eukaryotic genomes aiming to facilitate the annotation and identification of LIRs encoding long hpRNAs and siRNAs. LIRBase houses a comprehensive collection of LIRs identified in a wide range of eukaryotic genomes. In addition, LIRBase not only allows users to browse and search the identified LIRs in any eukaryotic genome(s) of interest available in GenBank, but also provides friendly web functionalities to facilitate users to identify LIRs in user-uploaded sequences, align sRNA sequencing data to LIRs, perform differential expression analysis of LIRs, predict mRNA targets for LIR-derived siRNAs, and visualize the secondary structure of candidate long hpRNAs encoded by LIRs. As demonstrated by two case studies, collectively, LIRBase bears the great utility for systematic investigation and characterization of LIRs and functional exploration of potential roles of LIRs and their derived siRNAs in diverse species.
Collapse
Affiliation(s)
- Lihua Jia
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China.,National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450002, China
| | - Yang Li
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Fangfang Huang
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Yingru Jiang
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Haoran Li
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Zhizhan Wang
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Tiantian Chen
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Jiaming Li
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Zhang Zhang
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Wen Yao
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| |
Collapse
|
5
|
Alamro H, Alzamel M, Iliopoulos CS, Pissis SP, Watts S. IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences. BMC Bioinformatics 2021; 22:51. [PMID: 33549041 PMCID: PMC7866733 DOI: 10.1186/s12859-021-03983-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/27/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. RESULTS We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. CONCLUSION Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
Collapse
Affiliation(s)
- Hayam Alamro
- Department of Informatics, King’s College London, 30 Aldwych, London, UK
- Department of Information Systems, Princess Nourah bint Abdulrahman University, Riyadh, Kingdom of Saudi Arabia
| | - Mai Alzamel
- Department of Informatics, King’s College London, 30 Aldwych, London, UK
- Computer Science Department, King Saud University, Riyadh, Kingdom of Saudi Arabia
| | | | - Solon P. Pissis
- Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Steven Watts
- Department of Informatics, King’s College London, 30 Aldwych, London, UK
| |
Collapse
|
6
|
Zhang R, Ge F, Li H, Chen Y, Zhao Y, Gao Y, Liu Z, Yang L. PCIR: a database of Plant Chloroplast Inverted Repeats. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5611292. [PMID: 31696928 PMCID: PMC6835207 DOI: 10.1093/database/baz127] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 09/26/2019] [Accepted: 10/07/2019] [Indexed: 01/06/2023]
Abstract
Inverted repeats (IRs) serve as potential biomarkers for genomic instability, DNA replication and other genetic processes. However, little information can be found in databases to help researchers recognize potential IR nucleotides, explore junction sites and annotate related functional genes. Plant Chloroplast Inverted Repeats (PCIR) is an interactive, web-based platform containing various sequenced chloroplast genomes that enables detection, searching and visualization of large-scale detailed information on IRs. PCIR contains many datasets, including 21 433 IRs, 113 plants chloroplast genomes, 16 948 functional genes and 21 659 visual maps. This database offers an online prediction tool for detecting IRs based on DNA sequences. PCIR can also analyze phylogenetic relationships using IR information among different species and provide users with high-quality marker maps. This database will be a valuable resource for IR distribution patterns, related genes and architectural features.
Collapse
Affiliation(s)
- Rui Zhang
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Fangfang Ge
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Huayang Li
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Yudong Chen
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Ying Zhao
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Ying Gao
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Zhiguo Liu
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| | - Long Yang
- Agricultural Big-Data Research Center and College of Plant Protection, Shandong Agricultural University, Tai'an 271018, China
| |
Collapse
|
7
|
Shi J, Liang C. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection. PLANT PHYSIOLOGY 2019; 180:1803-1815. [PMID: 31152127 PMCID: PMC6670090 DOI: 10.1104/pp.19.00386] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 05/17/2019] [Indexed: 05/25/2023]
Abstract
Comprehensive and accurate annotation of the repeatome, including transposons, is critical for deepening our understanding of repeat origins, biogenesis, regulatory mechanisms, and roles. Here, we developed Generic Repeat Finder (GRF), a tool for genome-wide repeat detection based on fast, exhaustive numerical calculation algorithms integrated with optimized dynamic programming strategies. GRF sensitively identifies terminal inverted repeats (TIRs), terminal direct repeats (TDRs), and interspersed repeats that bear both inverted and direct repeats. GRF also detects DNA or RNA transposable elements characterized by these repeats in plant and animal genomes. For TIRs and TDRs, GRF identifies spacers in the middle and mismatches/insertions or deletions in terminal repeats, showing their alignment or base-pairing information. GRF helps improve the annotation for various DNA transposons and retrotransposons, such as miniature inverted-repeat transposable elements (MITEs), long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons, including long interspersed nuclear elements and short interspersed nuclear elements in plants. We used GRF to perform TIR/TDR, interspersed-repeat, and MITE detection in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and mouse (Mus musculus). As a generic bioinformatics tool in repeat finding implemented as a parallelized C++ program, GRF was faster and more sensitive than the existing inverted repeat/MITE detection tools based on numerical approaches (i.e. detectIR and detectMITE) in Arabidopsis and mouse. GRF is more sensitive than Inverted Repeat Finder in TIR detection, LTR_FINDER in short TDR detection (≤1,000 nt), and phRAIDER in interspersed repeat detection in Arabidopsis and rice. GRF is an open source available from Github.
Collapse
Affiliation(s)
- Jieming Shi
- Department of Biology, Miami University, Oxford, Ohio 45056
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio 45056
| |
Collapse
|
8
|
Ye C, Ji G, Liang C. detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes. Sci Rep 2016; 6:19688. [PMID: 26795595 PMCID: PMC4726161 DOI: 10.1038/srep19688] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/14/2015] [Indexed: 12/27/2022] Open
Abstract
Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes, including plants and animals. Classified as a type of non-autonomous DNA transposable elements, they play important roles in genome organization and evolution. Comprehensive and accurate genome-wide detection of MITEs in various eukaryotic genomes can improve our understanding of their origins, transposition processes, regulatory mechanisms, and biological relevance with regard to gene structures, expression, and regulation. In this paper, we present a new MATLAB-based program called detectMITE that employs a novel numeric calculation algorithm to replace conventional string matching algorithms in MITE detection, adopts the Lempel-Ziv complexity algorithm to filter out MITE candidates with low complexity, and utilizes the powerful clustering program CD-HIT to cluster similar MITEs into MITE families. Using the rice genome as test data, we found that detectMITE can more accurately, comprehensively, and efficiently detect MITEs on a genome-wide scale than other popular MITE detection tools. Through comparison with the potential MITEs annotated in Repbase, the widely used eukaryotic repeat database, detectMITE has been shown to find known and novel MITEs with a complete structure and full-length copies in the genome. detectMITE is an open source tool (https://sourceforge.net/projects/detectmite).
Collapse
Affiliation(s)
- Congting Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.,Department of Biology, Miami University, Oxford, Ohio 45056, USA
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, Fujian 361102, China
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056, USA
| |
Collapse
|
9
|
Ye C, Ji G, Li L, Liang C. detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation. PLoS One 2014; 9:e113349. [PMID: 25409465 PMCID: PMC4237412 DOI: 10.1371/journal.pone.0113349] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 10/22/2014] [Indexed: 11/19/2022] Open
Abstract
Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures--hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.
Collapse
Affiliation(s)
- Congting Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China; Department of Biology, Miami University, Oxford, Ohio 45056, United States of America
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China; Innovation Center for Cell Biology, Xiamen University, Xiamen, Fujian 361005, China
| | - Lei Li
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China; Department of Biology, Miami University, Oxford, Ohio 45056, United States of America
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056, United States of America; State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| |
Collapse
|
10
|
Jeve YB. The combined use of antimullerian hormone and age to predict the ovarian response to controlled ovarian hyperstimulation in poor responders: A novel approach. J Hum Reprod Sci 2014; 6:259-62. [PMID: 24672166 PMCID: PMC3963310 DOI: 10.4103/0974-1208.126298] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Revised: 11/27/2013] [Accepted: 12/31/2013] [Indexed: 11/29/2022] Open
Abstract
CONTEXT: Reduced ovarian response to stimulation represents one of the most intractable problems in infertility treatment. As failed cycle can cause considerable amount of emotional and economical loss, there are various attempts made to predict ovarian response. AIMS: To evaluate different factors influencing outcome of assisted reproduction in women with predicted reduced response (antimullerian hormone between 1 and 5 pmol/L) and to develop a model using of AMH and age to predict the number of oocytes in poor responders. SETTINGS AND DESIGN: Retrospective study in a teaching hospital. MATERIALS AND METHODS: We analyzed 85 cycles (57 women) with predicted reduced response with serum AMH value between 1 and 5 pmol/L. Standard ovarian stimulation protocol was used. Primary outcome measures were clinical pregnancy rates and oocytes retrieved. STATISTICAL ANALYSIS USED: Data were analyzed using Microsoft excel and MetlabR software. RESULTS: Clinical pregnancy rate/ET was 20.33%, in this group. AMH and age was analyzed using linear regression model which produced an equation to give predicted oocyte count if AMH and age are known. (Oocytes = age × (-ß) + Serum AMH × α) (Constant ß=0.0102 and α = 1.0407). CONCLUSIONS: Combined use of serum AMH and age to predict ovarian response within reduced responder group should be further evaluated. For first time, we suggested combining both factors to predict ovarian response using a simple equation which allow developing tailored strategy.
Collapse
Affiliation(s)
- Yadava Bapurao Jeve
- Department Obstetrics and Gynaecology, University Hospitals of Leicester, Leicester, UK
| |
Collapse
|