1
|
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res 2023; 33:1994-2001. [PMID: 37918962 PMCID: PMC10760446 DOI: 10.1101/gr.278005.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
Collapse
Affiliation(s)
- Yujie Zhang
- Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Justin Chu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
2
|
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. ARXIV 2023:arXiv:2304.09729v1. [PMID: 37131874 PMCID: PMC10153287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
Collapse
Affiliation(s)
- Yujie Zhang
- Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Justin Chu
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, USA
| |
Collapse
|