1
|
Hu K, Ni P, Xu M, Zou Y, Chang J, Gao X, Li Y, Ruan J, Hu B, Wang J. HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation. Nat Commun 2024; 15:5573. [PMID: 38956036 PMCID: PMC11219922 DOI: 10.1038/s41467-024-49912-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 06/25/2024] [Indexed: 07/04/2024] Open
Abstract
Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.
Collapse
Affiliation(s)
- Kang Hu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Minghua Xu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - You Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jianye Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
| | - Bin Hu
- Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education (Beijing Institute of Technology), Beijing, P. R. China.
- School of Medical Technology, Beijing Institute of Technology, Beijing, P. R. China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
2
|
Loreto ELS, Melo ESD, Wallau GL, Gomes TMFF. The good, the bad and the ugly of transposable elements annotation tools. Genet Mol Biol 2024; 46:e20230138. [PMID: 38373163 PMCID: PMC10876081 DOI: 10.1590/1678-4685-gmb-2023-0138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/26/2023] [Indexed: 02/21/2024] Open
Abstract
Transposable elements are repetitive and mobile DNA segments that can be found in virtually all organisms investigated to date. Their complex structure and variable nature are particularly challenging from the genomic annotation point of view. Many softwares have been developed to automate and facilitate TEs annotation at the genomic level, but they are highly heterogeneous regarding documentation, usability and methods. In this review, we revisited the existing software for TE genomic annotation, concentrating on the most often used ones, the methodologies they apply, and usability. Building on the state of the art of TE annotation software we propose best practices and highlight the strengths and weaknesses from the available solutions.
Collapse
Affiliation(s)
- Elgion L S Loreto
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
- Universidade Federal de Santa Maria, Departamento de Bioquímica e Biologia Molecular, Santa Maria, RS, Brazil
| | - Elverson S de Melo
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Gabriel L Wallau
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Tiago M F F Gomes
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
| |
Collapse
|
3
|
Zhang H, Ni S, Frith MC. An immune-suppressing protein in human endogenous retroviruses. BIOINFORMATICS ADVANCES 2023; 3:vbad013. [PMID: 36818731 PMCID: PMC9927554 DOI: 10.1093/bioadv/vbad013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/25/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023]
Abstract
Motivation Retroviruses are important contributors to disease and evolution in vertebrates. Sometimes, retrovirus DNA is heritably inserted in a vertebrate genome: an endogenous retrovirus (ERV). Vertebrate genomes have many such virus-derived fragments, usually with mutations disabling their original functions. Results Some primate ERVs appear to encode an overlooked protein. This protein is homologous to protein MC132 from Molluscum contagiosum virus, which is a human poxvirus, not a retrovirus. MC132 suppresses the immune system by targeting NF- κ B, and it had no known homologs until now. The ERV homologs of MC132 in the human genome are mostly disrupted by mutations, but there is an intact copy on chromosome 4. We found homologs of MC132 in ERVs of apes, monkeys and bushbaby, but not tarsiers, lemurs or non-primates. This suggests that some primate retroviruses had, or have, an extra immune-suppressing protein, which underwent horizontal genetic transfer between unrelated viruses. Contact mcfrith@edu.k.u-tokyo.ac.jp.
Collapse
|