1
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
2
|
Zhao P, Peng C, Fang L, Wang Z, Liu GE. Taming transposable elements in livestock and poultry: a review of their roles and applications. Genet Sel Evol 2023; 55:50. [PMID: 37479995 PMCID: PMC10362595 DOI: 10.1186/s12711-023-00821-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/30/2023] [Indexed: 07/23/2023] Open
Abstract
Livestock and poultry play a significant role in human nutrition by converting agricultural by-products into high-quality proteins. To meet the growing demand for safe animal protein, genetic improvement of livestock must be done sustainably while minimizing negative environmental impacts. Transposable elements (TE) are important components of livestock and poultry genomes, contributing to their genetic diversity, chromatin states, gene regulatory networks, and complex traits of economic value. However, compared to other species, research on TE in livestock and poultry is still in its early stages. In this review, we analyze 72 studies published in the past 20 years, summarize the TE composition in livestock and poultry genomes, and focus on their potential roles in functional genomics. We also discuss bioinformatic tools and strategies for integrating multi-omics data with TE, and explore future directions, feasibility, and challenges of TE research in livestock and poultry. In addition, we suggest strategies to apply TE in basic biological research and animal breeding. Our goal is to provide a new perspective on the importance of TE in livestock and poultry genomes.
Collapse
Affiliation(s)
- Pengju Zhao
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Chen Peng
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark.
| | - Zhengguang Wang
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China.
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
3
|
Magdy Mohamed Abdelaziz Barakat S, Sallehuddin R, Yuhaniz SS, R. Khairuddin RF, Mahmood Y. Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges. PeerJ Comput Sci 2023; 9:e1180. [PMID: 37547391 PMCID: PMC10403225 DOI: 10.7717/peerj-cs.1180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 04/27/2023] [Indexed: 08/08/2023]
Abstract
Background The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge. Method The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article's primary aim and contribution are to support the researchers through an extensive review to ease other researchers' search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization. Results Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach. Conclusion We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.
Collapse
Affiliation(s)
| | - Roselina Sallehuddin
- Computer Science, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
| | - Siti Sophiayati Yuhaniz
- Advanced Informatics Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur, Kuala Lumpur, Malaysia
| | | | - Yasir Mahmood
- Faculty of Information Technology, The University of Lahore, Lahore, Lahore, Pakistan
| |
Collapse
|
4
|
Shao G, He T, Mu Y, Mu P, Ao J, Lin X, Ruan L, Wang Y, Gao Y, Liu D, Zhang L, Chen X. The genome of a hadal sea cucumber reveals novel adaptive strategies to deep-sea environments. iScience 2022; 25:105545. [PMID: 36444293 PMCID: PMC9700323 DOI: 10.1016/j.isci.2022.105545] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 01/18/2022] [Accepted: 11/07/2022] [Indexed: 11/11/2022] Open
Abstract
How organisms cope with coldness and high pressure in the hadal zone remains poorly understood. Here, we sequenced and assembled the genome of hadal sea cucumber Paelopatides sp. Yap with high quality and explored its potential mechanisms for deep-sea adaptation. First, the expansion of ACOX1 for rate-limiting enzyme in the DHA synthesis pathway, increased DHA content in the phospholipid bilayer, and positive selection of EPT1 may maintain cell membrane fluidity. Second, three genes for translation initiation factors and two for ribosomal proteins underwent expansion, and three ribosomal protein genes were positively selected, which may ameliorate the protein synthesis inhibition or ribosome dissociation in the hadal zone. Third, expansion and positive selection of genes associated with stalled replication fork recovery and DNA repair suggest improvements in DNA protection. This is the first genome sequence of a hadal invertebrate. Our results provide insights into the genetic adaptations used by invertebrate in deep oceans.
Collapse
Affiliation(s)
- Guangming Shao
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Tianliang He
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Yinnan Mu
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Pengfei Mu
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Jingqun Ao
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Xihuang Lin
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian 361005, China
| | - Lingwei Ruan
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian 361005, China
| | - YuGuang Wang
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian 361005, China
| | - Yuan Gao
- Genomics and Genetic Engineering Laboratory of Ornamental Plants, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Dinggao Liu
- Genomics and Genetic Engineering Laboratory of Ornamental Plants, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Liangsheng Zhang
- Genomics and Genetic Engineering Laboratory of Ornamental Plants, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Xinhua Chen
- Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, Guangdong 519000, China
| |
Collapse
|
5
|
Usha T, Middha SK, Babu D, Goyal AK, Das AJ, Saini D, Sarangi A, Krishnamurthy V, Prasannakumar MK, Saini DK, Sidhalinghamurthy KR. Hybrid Assembly and Annotation of the Genome of the Indian Punica granatum, a Superfood. Front Genet 2022; 13:786825. [PMID: 35646087 PMCID: PMC9130716 DOI: 10.3389/fgene.2022.786825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/15/2022] [Indexed: 12/13/2022] Open
Abstract
The wonder fruit pomegranate (Punica granatum, family Lythraceae) is one of India’s economically important fruit crops that can grow in different agro-climatic conditions ranging from tropical to temperate regions. This study reports high-quality de novo draft hybrid genome assembly of diploid Punica cultivar “Bhagwa” and identifies its genomic features. This cultivar is most common among the farmers due to its high sustainability, glossy red color, soft seed, and nutraceutical properties with high market value. The draft genome assembly is about 361.76 Mb (N50 = 40 Mb), ∼9.0 Mb more than the genome size estimated by flow cytometry. The genome is 90.9% complete, and only 26.68% of the genome is occupied by transposable elements and has a relative abundance of 369.93 SSRs/Mb of the genome. A total of 30,803 proteins and their putative functions were predicted. Comparative whole-genome analysis revealed Eucalyptus grandis as the nearest neighbor. KEGG-KASS annotations indicated an abundance of genes involved in the biosynthesis of flavonoids, phenylpropanoids, and secondary metabolites, which are responsible for various medicinal properties of pomegranate, including anticancer, antihyperglycemic, antioxidant, and anti-inflammatory activities. The genome and gene annotations provide new insights into the pharmacological properties of the secondary metabolites synthesized in pomegranate. They will also serve as a valuable resource in mining biosynthetic pathways for key metabolites, novel genes, and variations associated with disease resistance, which can facilitate the breeding of new varieties with high yield and superior quality.
Collapse
Affiliation(s)
- Talambedu Usha
- Department of Biochemistry, Bangalore University, Bengaluru, India
| | - Sushil Kumar Middha
- DBT-BIF Facility, Department of Biotechnology, Maharani Lakshmi Ammanni College for Women, Bengaluru, India
| | - Dinesh Babu
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB, Canada
| | - Arvind Kumar Goyal
- Centre for Bamboo Studies, Department of Biotechnology, Bodoland University, Kokrajhar, India
| | | | - Deepti Saini
- Protein Design Private Limited, Bengaluru, India
| | | | | | | | - Deepak Kumar Saini
- Department of Molecular Reproduction Development and Genetics, Indian Institute of Science, Bengaluru, India
| | | |
Collapse
|
6
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
7
|
Liao X, Hu K, Salhi A, Zou Y, Wang J, Gao X. msRepDB: a comprehensive repetitive sequence database of over 80 000 species. Nucleic Acids Res 2021; 50:D236-D245. [PMID: 34850956 PMCID: PMC8728181 DOI: 10.1093/nar/gkab1089] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 10/18/2021] [Accepted: 11/30/2021] [Indexed: 11/13/2022] Open
Abstract
Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html).
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Kang Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Adil Salhi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - You Zou
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| |
Collapse
|