Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gu W, Castoe TA, Hedges DJ, Batzer MA, Pollock DD. Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 2008;380:77-83. [PMID: 18541131 DOI: 10.1016/j.ab.2008.05.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2008] [Revised: 05/01/2008] [Accepted: 05/02/2008] [Indexed: 11/28/2022]

For:	Gu W, Castoe TA, Hedges DJ, Batzer MA, Pollock DD. Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 2008;380:77-83. [PMID: 18541131 DOI: 10.1016/j.ab.2008.05.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2008] [Revised: 05/01/2008] [Accepted: 05/02/2008] [Indexed: 11/28/2022]

Number

Cited by Other Article(s)

Hu K, Ni P, Xu M, Zou Y, Chang J, Gao X, Li Y, Ruan J, Hu B, Wang J. HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation. Nat Commun 2024;15:5573. [PMID: 38956036 PMCID: PMC11219922 DOI: 10.1038/s41467-024-49912-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 06/25/2024] [Indexed: 07/04/2024] Open

Affiliation(s)

Kang Hu School of Computer Science and Engineering, Central South University, Changsha, 410083, China Xiangjiang Laboratory, Changsha, 410205, China Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
Peng Ni School of Computer Science and Engineering, Central South University, Changsha, 410083, China Xiangjiang Laboratory, Changsha, 410205, China Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
Minghua Xu School of Computer Science and Engineering, Central South University, Changsha, 410083, China Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
You Zou School of Computer Science and Engineering, Central South University, Changsha, 410083, China Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
Jianye Chang Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
Xin Gao Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Yaohang Li Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
Jue Ruan Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China
Bin Hu Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education (Beijing Institute of Technology), Beijing, P. R. China. School of Medical Technology, Beijing Institute of Technology, Beijing, P. R. China.
Jianxin Wang School of Computer Science and Engineering, Central South University, Changsha, 410083, China. Xiangjiang Laboratory, Changsha, 410205, China. Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.

Collapse

Rudenko V, Korotkov E. Study of Dispersed Repeats in the Cyanidioschyzon merolae Genome. Int J Mol Sci 2024;25:4441. [PMID: 38674025 PMCID: PMC11050394 DOI: 10.3390/ijms25084441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open

Yang L, Metzger GA, Padilla Del Valle R, Delgadillo Rubalcaba D, McLaughlin RN. Evolutionary insights from profiling LINE-1 activity at allelic resolution in a single human genome. EMBO J 2024;43:112-131. [PMID: 38177314 PMCID: PMC10883270 DOI: 10.1038/s44318-023-00007-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/18/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open

Zhao P, Gu L, Gao Y, Pan Z, Liu L, Li X, Zhou H, Yu D, Han X, Qian L, Liu GE, Fang L, Wang Z. Young SINEs in pig genomes impact gene regulation, genetic diversity, and complex traits. Commun Biol 2023;6:894. [PMID: 37652983 PMCID: PMC10471783 DOI: 10.1038/s42003-023-05234-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/09/2023] [Indexed: 09/02/2023] Open

Affiliation(s)

Pengju Zhao Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, 572000, China College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
Lihong Gu Institute of Animal Science & Veterinary Medicine, Hainan Academy of Agricultural Sciences, No. 14 Xingdan Road, Haikou, 571100, China
Yahui Gao Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA
Zhangyuan Pan Department of Animal Science, University of California, Davis, CA, 95616, USA
Lei Liu Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China
Xingzheng Li Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, China
Huaijun Zhou Department of Animal Science, University of California, Davis, CA, 95616, USA
Dongyou Yu Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, 572000, China College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
Xinyan Han Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, 572000, China College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
Lichun Qian Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, 572000, China College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
George E Liu Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
Lingzhao Fang Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, 8000, Denmark.
Zhengguang Wang Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, 572000, China. College of Animal Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.

Collapse

Rodriguez M, Makałowski W. Software evaluation for de novo detection of transposons. Mob DNA 2022;13:14. [PMID: 35477485 PMCID: PMC9047281 DOI: 10.1186/s13100-022-00266-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 03/16/2022] [Indexed: 11/16/2022] Open

Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022;13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open

Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Zeng C, Takeda A, Sekine K, Osato N, Fukunaga T, Hamada M. Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs. Methods Mol Biol 2022;2509:315-340. [PMID: 35796972 DOI: 10.1007/978-1-0716-2380-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Feng C, Dai M, Liu Y, Chen M. Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage. Brief Bioinform 2020;22:5855256. [PMID: 32591772 DOI: 10.1093/bib/bbaa086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 04/10/2020] [Accepted: 04/22/2020] [Indexed: 11/12/2022] Open

Shortt JA, Ruggiero RP, Cox C, Wacholder AC, Pollock DD. Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA 2020;11:11. [PMID: 32095164 PMCID: PMC7027126 DOI: 10.1186/s13100-020-00206-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/04/2020] [Indexed: 12/19/2022] Open

Abstract

Background

Previously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs.

Results

The sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome.

Conclusions

Our analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class of Alu (roughly, AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in older Alus. This work demonstrates that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.

Collapse

Orozco-Arias S, Isaza G, Guyot R. Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning. Int J Mol Sci 2019;20:E3837. [PMID: 31390781 PMCID: PMC6696364 DOI: 10.3390/ijms20153837] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/31/2019] [Accepted: 08/02/2019] [Indexed: 01/26/2023] Open

Li W, Freudenberg J, Freudenberg J. Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome. Gene 2019;691:141-152. [PMID: 30630097 DOI: 10.1016/j.gene.2018.12.040] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 12/07/2018] [Accepted: 12/14/2018] [Indexed: 10/27/2022]

Metsky HC, Siddle KJ, Gladden-Young A, Qu J, Yang DK, Brehio P, Goldfarb A, Piantadosi A, Wohl S, Carter A, Lin AE, Barnes KG, Tully DC, Corleis B, Hennigan S, Barbosa-Lima G, Vieira YR, Paul LM, Tan AL, Garcia KF, Parham LA, Odia I, Eromon P, Folarin OA, Goba A, Simon-Lorière E, Hensley L, Balmaseda A, Harris E, Kwon DS, Allen TM, Runstadler JA, Smole S, Bozza FA, Souza TML, Isern S, Michael SF, Lorenzana I, Gehrke L, Bosch I, Ebel G, Grant DS, Happi CT, Park DJ, Gnirke A, Sabeti PC, Matranga CB. Capturing sequence diversity in metagenomes with comprehensive and scalable probe design. Nat Biotechnol 2019;37:160-168. [PMID: 30718881 PMCID: PMC6587591 DOI: 10.1038/s41587-018-0006-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 12/18/2018] [Indexed: 01/24/2023]

Affiliation(s)

Hayden C. Metsky grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,20000 0001 2341 2786grid.116068.8Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA USA
Katherine J. Siddle grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA
Adrianne Gladden-Young grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
James Qu grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
David K. Yang grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA
Patrick Brehio grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
Andrew Goldfarb 000000041936754Xgrid.38142.3cFaculty of Arts and Sciences, Harvard University, Cambridge, MA USA
Anne Piantadosi grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,50000 0004 0386 9924grid.32224.35Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA USA
Shirlee Wohl grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA
Amber Carter grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
Aaron E. Lin grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA
Kayla G. Barnes grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA ,6000000041936754Xgrid.38142.3cDepartment of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA USA
Damien C. Tully 0000 0004 0489 3491grid.461656.6The Ragon Institute of MGH, MIT and Harvard, Cambridge, MA USA
Bjӧrn Corleis 0000 0004 0489 3491grid.461656.6The Ragon Institute of MGH, MIT and Harvard, Cambridge, MA USA
Scott Hennigan 0000 0004 0378 6934grid.416511.6Massachusetts Department of Public Health, Boston, MA USA
Giselle Barbosa-Lima 0000 0001 0723 0931grid.418068.3Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Rio de Janeiro, Brazil
Yasmine R. Vieira 0000 0001 0723 0931grid.418068.3Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Rio de Janeiro, Brazil
Lauren M. Paul 0000 0001 0647 2963grid.255962.fDepartment of Biological Sciences, College of Arts and Sciences, Florida Gulf Coast University, Fort Myers, FL USA
Amanda L. Tan 0000 0001 0647 2963grid.255962.fDepartment of Biological Sciences, College of Arts and Sciences, Florida Gulf Coast University, Fort Myers, FL USA
Kimberly F. Garcia 0000 0001 2297 2829grid.10601.36Instituto de Investigacion en Microbiologia, Universidad Nacional Autónoma de Honduras, Tegucigalpa, Honduras
Leda A. Parham 0000 0001 2297 2829grid.10601.36Instituto de Investigacion en Microbiologia, Universidad Nacional Autónoma de Honduras, Tegucigalpa, Honduras
Ikponmwosa Odia Institute of Lassa Fever Research and Control, Irrua Specialist Teaching Hospital, Irrua, Nigeria
Philomena Eromon grid.442553.1African Center of Excellence for Genomics of Infectious Disease (ACEGID), Redeemer’s University, Ede, Nigeria
Onikepe A. Folarin grid.442553.1African Center of Excellence for Genomics of Infectious Disease (ACEGID), Redeemer’s University, Ede, Nigeria ,14grid.442553.1Department of Biological Sciences, College of Natural Sciences, Redeemer’s University, Ede, Nigeria
Augustine Goba Lassa Fever Laboratory, Kenema Government Hospital, Kenema, Sierra Leone
Viral Hemorrhagic Fever Consortium
Etienne Simon-Lorière 0000 0001 2353 6535grid.428999.7Evolutionary Genomics of RNA Viruses, Virology Department, Institut Pasteur, Paris, France
Lisa Hensley 0000 0001 2164 9667grid.419681.3Integrated Research Facility, Division of Clinical Research, National Institute of Allergy and Infectious Diseases, US National Institutes of Health, Frederick, MD USA
Angel Balmaseda Laboratorio Nacional de Virología, Centro Nacional de Diagnóstico y Referencia, Ministry of Health, Managua, Nicaragua
Eva Harris 0000 0001 2181 7878grid.47840.3fDivision of Infectious Diseases and Vaccinology, School of Public Health, University of California, Berkeley, Berkeley, CA USA
Douglas S. Kwon 0000 0004 0386 9924grid.32224.35Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA USA ,70000 0004 0489 3491grid.461656.6The Ragon Institute of MGH, MIT and Harvard, Cambridge, MA USA
Todd M. Allen 0000 0004 0489 3491grid.461656.6The Ragon Institute of MGH, MIT and Harvard, Cambridge, MA USA
Jonathan A. Runstadler 0000 0004 1936 7531grid.429997.8Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA USA
Sandra Smole 0000 0004 0378 6934grid.416511.6Massachusetts Department of Public Health, Boston, MA USA
Fernando A. Bozza 0000 0001 0723 0931grid.418068.3Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Rio de Janeiro, Brazil
Thiago M. L. Souza 0000 0001 0723 0931grid.418068.3Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Rio de Janeiro, Brazil
Sharon Isern 0000 0001 0647 2963grid.255962.fDepartment of Biological Sciences, College of Arts and Sciences, Florida Gulf Coast University, Fort Myers, FL USA
Scott F. Michael 0000 0001 0647 2963grid.255962.fDepartment of Biological Sciences, College of Arts and Sciences, Florida Gulf Coast University, Fort Myers, FL USA
Ivette Lorenzana 0000 0001 2297 2829grid.10601.36Instituto de Investigacion en Microbiologia, Universidad Nacional Autónoma de Honduras, Tegucigalpa, Honduras
Lee Gehrke 0000 0001 2341 2786grid.116068.8Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA USA ,23000000041936754Xgrid.38142.3cDepartment of Microbiology and Immunobiology, Harvard Medical School, Boston, MA USA
Irene Bosch 0000 0001 2341 2786grid.116068.8Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA USA
Gregory Ebel 0000 0004 1936 8083grid.47894.36Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO USA
Donald S. Grant Lassa Fever Laboratory, Kenema Government Hospital, Kenema, Sierra Leone ,250000 0001 2290 9707grid.442296.fCollege of Medicine and Allied Health Sciences, University of Sierra Leone, Freetown, Sierra Leone
Christian T. Happi 000000041936754Xgrid.38142.3cDepartment of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA USA ,12Institute of Lassa Fever Research and Control, Irrua Specialist Teaching Hospital, Irrua, Nigeria ,13grid.442553.1African Center of Excellence for Genomics of Infectious Disease (ACEGID), Redeemer’s University, Ede, Nigeria ,14grid.442553.1Department of Biological Sciences, College of Natural Sciences, Redeemer’s University, Ede, Nigeria
Daniel J. Park grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
Andreas Gnirke grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
Pardis C. Sabeti grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,3000000041936754Xgrid.38142.3cDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA USA ,6000000041936754Xgrid.38142.3cDepartment of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA USA ,260000 0001 2167 1581grid.413575.1Howard Hughes Medical Institute, Chevy Chase, MD USA
Christian B. Matranga grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA

Collapse

Guizard S, Piégu B, Arensburger P, Guillou F, Bigot Y. Deep landscape update of dispersed and tandem repeats in the genome model of the red jungle fowl, Gallus gallus, using a series of de novo investigating tools. BMC Genomics 2016;17:659. [PMID: 27542599 PMCID: PMC4992247 DOI: 10.1186/s12864-016-3015-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Accepted: 08/12/2016] [Indexed: 01/19/2023] Open

Maumus F, Quesneville H. Impact and insights from ancient repetitive elements in plant genomes. CURRENT OPINION IN PLANT BIOLOGY 2016;30:41-6. [PMID: 26874965 DOI: 10.1016/j.pbi.2016.01.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 01/04/2016] [Accepted: 01/17/2016] [Indexed: 05/13/2023]

Nicolas J, Peterlongo P, Tempel S. Finding and Characterizing Repeats in Plant Genomes. Methods Mol Biol 2016;1374:293-337. [PMID: 26519414 DOI: 10.1007/978-1-4939-3167-5_17] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Abstract

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.

Collapse

Bast J, Schaefer I, Schwander T, Maraun M, Scheu S, Kraaijeveld K. No Accumulation of Transposable Elements in Asexual Arthropods. Mol Biol Evol 2015;33:697-706. [PMID: 26560353 PMCID: PMC4760076 DOI: 10.1093/molbev/msv261] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Sun C, Mueller RL. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders. Genome Biol Evol 2015;6:1818-29. [PMID: 25115007 PMCID: PMC4122941 DOI: 10.1093/gbe/evu143] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes.

Collapse

Maumus F, Fiston-Lavier AS, Quesneville H. Impact of transposable elements on insect genomes and biology. CURRENT OPINION IN INSECT SCIENCE 2015;7:30-36. [PMID: 32846669 DOI: 10.1016/j.cois.2015.01.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 12/30/2014] [Accepted: 01/06/2015] [Indexed: 06/11/2023]

Inference of transposable element ancestry. PLoS Genet 2014;10:e1004482. [PMID: 25121584 PMCID: PMC4133154 DOI: 10.1371/journal.pgen.1004482] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 05/16/2014] [Indexed: 01/11/2023] Open

Abstract

Most common methods for inferring transposable element (TE) evolutionary relationships are based on dividing TEs into subfamilies using shared diagnostic nucleotides. Although originally justified based on the “master gene” model of TE evolution, computational and experimental work indicates that many of the subfamilies generated by these methods contain multiple source elements. This implies that subfamily-based methods give an incomplete picture of TE relationships. Studies on selection, functional exaptation, and predictions of horizontal transfer may all be affected. Here, we develop a Bayesian method for inferring TE ancestry that gives the probability that each sequence was replicative, its frequency of replication, and the probability that each extant TE sequence came from each possible ancestral sequence. Applying our method to 986 members of the newly-discovered LAVA family of TEs, we show that there were far more source elements in the history of LAVA expansion than subfamilies identified using the CoSeg subfamily-classification program. We also identify multiple replicative elements in the AluSc subfamily in humans. Our results strongly indicate that a reassessment of subfamily structures is necessary to obtain accurate estimates of mutation processes, phylogenetic relationships and historical times of activity.

The most common entities in vertebrate genomes are transposable elements (TEs), DNA sequences that have been repeatedly copied and inserted into new locations throughout the genome. Some TEs have been replicated hundreds of thousands of times, and their ecology and evolutionary history within a genome is thus critical to understanding how genome structure evolves. It was once thought that only a few “master gene” copies could replicate, while the rest were inactive (dead on arrival), but recent computational and laboratory studies have indicated that this is not the case. However, previous methods for reconstructing TE evolutionary history were not designed to solve the problem of determining the ancestral source sequence for large numbers of elements. Here, we present a new method that is. Our method surveys all likely TE ancestors and determines the probability that each modern element arose from each of its plausible ancestors. We applied our method to the gibbon-derived LAVA TE family and to the human AluSc subfamily and inferred many more source elements than indicated by previous methods. This new method will help us better understand TE evolution, including both the impact of sequence on replication and the substitution process after replication.

Collapse

Deep investigation of Arabidopsis thaliana junk DNA reveals a continuum between repetitive elements and genomic dark matter. PLoS One 2014;9:e94101. [PMID: 24709859 PMCID: PMC3978025 DOI: 10.1371/journal.pone.0094101] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Accepted: 03/10/2014] [Indexed: 11/19/2022] Open

The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci U S A 2013;110:20645-50. [PMID: 24297902 DOI: 10.1073/pnas.1314475110] [Citation(s) in RCA: 203] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Fernandez-Silva I, Whitney J, Wainwright B, Andrews KR, Ylitalo-Ward H, Bowen BW, Toonen RJ, Goetze E, Karl SA. Microsatellites for next-generation ecologists: a post-sequencing bioinformatics pipeline. PLoS One 2013;8:e55990. [PMID: 23424642 PMCID: PMC3570555 DOI: 10.1371/journal.pone.0055990] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Accepted: 01/04/2013] [Indexed: 11/18/2022] Open

Jiang N. Overview of repeat annotation and de novo repeat identification. Methods Mol Biol 2013;1057:275-87. [PMID: 23918436 DOI: 10.1007/978-1-62703-568-2_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Xu HE, Zhang HH, Han MJ, Shen YH, Huang XZ, Xiang ZH, Zhang Z. [Computational approaches for identification and classification of transposable elements in eukaryotic genomes]. YI CHUAN = HEREDITAS 2012;34:1009-1019. [PMID: 22917906 DOI: 10.3724/sp.j.1005.2012.01009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Janicki M, Rooke R, Yang G. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes. Chromosome Res 2012;19:787-808. [PMID: 21850457 DOI: 10.1007/s10577-011-9230-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Flutre T, Permal E, Quesneville H. Transposable Element Annotation in Completely Sequenced Eukaryote Genomes. PLANT TRANSPOSABLE ELEMENTS 2012. [DOI: 10.1007/978-3-642-31842-9_2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Permal E, Flutre T, Quesneville H. Roadmap for annotating transposable elements in eukaryote genomes. Methods Mol Biol 2012;859:53-68. [PMID: 22367865 DOI: 10.1007/978-1-61779-603-6_3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Sun C, Shepard DB, Chong RA, López Arriaza J, Hall K, Castoe TA, Feschotte C, Pollock DD, Mueller RL. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol Evol 2011;4:168-83. [PMID: 22200636 PMCID: PMC3318908 DOI: 10.1093/gbe/evr139] [Citation(s) in RCA: 130] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2011] [Indexed: 01/20/2023] Open

Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 2011;7:e1002384. [PMID: 22144907 PMCID: PMC3228813 DOI: 10.1371/journal.pgen.1002384] [Citation(s) in RCA: 724] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 10/04/2011] [Indexed: 12/18/2022] Open

Castoe TA, Hall KT, Guibotsy Mboulas ML, Gu W, de Koning APJ, Fox SE, Poole AW, Vemulapalli V, Daza JM, Mockler T, Smith EN, Feschotte C, Pollock DD. Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing. Genome Biol Evol 2011;3:641-53. [PMID: 21572095 PMCID: PMC3157835 DOI: 10.1093/gbe/evr043] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Flutre T, Duprat E, Feuillet C, Quesneville H. Considering transposable element diversification in de novo annotation approaches. PLoS One 2011;6:e16526. [PMID: 21304975 PMCID: PMC3031573 DOI: 10.1371/journal.pone.0016526] [Citation(s) in RCA: 324] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 01/04/2011] [Indexed: 01/24/2023] Open

Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity (Edinb) 2009;104:520-33. [PMID: 19935826 DOI: 10.1038/hdy.2009.165] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Belancio VP, Deininger PL, Roy-Engel AM. LINE dancing in the human genome: transposable elements and disease. Genome Med 2009;1:97. [PMID: 19863772 PMCID: PMC2784310 DOI: 10.1186/gm97] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open