1
|
Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun 2023; 14:7223. [PMID: 37940654 PMCID: PMC10632439 DOI: 10.1038/s41467-023-43017-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
Collapse
Affiliation(s)
- Ida Shinder
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| | - Richard Hu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Hyun Joo Ji
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Tandem NBPF 3mer HORs (Olduvai triplets) in Neanderthal and two novel HOR tandem arrays in human chromosome 1 T2T-CHM13 assembly. Sci Rep 2023; 13:14420. [PMID: 37660151 PMCID: PMC10475015 DOI: 10.1038/s41598-023-41517-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/28/2023] [Indexed: 09/04/2023] Open
Abstract
It is known that the ~ 1.6 kb Neuroblastoma BreakPoint Family (NBPF) repeats are human specific and contributing to cognitive capabilities, with increasing frequency in higher order repeat 3mer HORs (Olduvai triplets). From chimpanzee to modern human there is a discontinuous jump from 0 to ~ 50 tandemly organized 3mer HORs. Here we investigate the structure of NBPF 3mer HORs in the Neanderthal genome assembly of Pääbo et al., comparing it to the results obtained for human hg38.p14 chromosome 1. Our findings reveal corresponding NBPF 3mer HOR arrays in Neanderthals with slightly different monomer structures and numbers of HOR copies compared to humans. Additionally, we compute the NBPF 3mer HOR pattern for the complete telomere-to-telomere human genome assembly (T2T-CHM13) by Miga et al., identifying two novel tandem arrays of NBPF 3mer HOR repeats with 5 and 9 NBPF 3mer HOR copies. We hypothesize that these arrays correspond to novel NBPF genes (here referred to as NBPFA1 and NBPFA2). Further improving the quality of the Neanderthal genome using T2T-CHM13 as a reference would be of great interest in determining the presence of such distant novel NBPF genes in the Neanderthal genome and enhancing our understanding of human evolution.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000, Zagreb, Croatia.
| | | | - Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000, Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000, Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000, Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000, Zagreb, Croatia
| |
Collapse
|
3
|
Pacheco A, Issaian A, Davis J, Anderson N, Nemkov T, Paukovich N, Henen MA, Vögeli B, Sikela JM, Hansen K. Proteolytic activation of human-specific Olduvai domains by the furin protease. Int J Biol Macromol 2023; 234:123041. [PMID: 36581038 PMCID: PMC10038901 DOI: 10.1016/j.ijbiomac.2022.12.260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 12/22/2022] [Indexed: 12/28/2022]
Abstract
Olduvai protein domains (formerly DUF1220) show the greatest human-specific increase in copy number of any coding region in the genome and are highly correlated with human brain evolution and cognitive disease. The majority of human copies are found within four NBPF genes organized in a variable number of a tandemly arranged three-domain blocks called Olduvai triplets. Here we show that these human-specific Olduvai domains are posttranslationally processed by the furin protease, with a cleavage site occurring once at each triplet. These findings suggest that all expanded human-specific NBPF genes encode proproteins consisting of many independent Olduvai triplet proteins which are activated by furin processing. The exceptional correlation of Olduvai copy number and brain size taken together with our new furin data, indicates the ultimate target of selection was a rapid increase in dosage of autonomously functioning Olduvai triplet proteins, and that these proteins are the primary active agent underlying Olduvai's role in human brain expansion.
Collapse
Affiliation(s)
- Ashley Pacheco
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Aaron Issaian
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Jonathan Davis
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Nathan Anderson
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Travis Nemkov
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Natasia Paukovich
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Morkos A Henen
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - Beat Vögeli
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA
| | - James M Sikela
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA.
| | - Kirk Hansen
- Department of Biochemistry and Molecular Genetics, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
4
|
Vořechovský I. Selection of Olduvai Domains during Evolution: A Role for Primate-Specific Splicing Super-Enhancer and RNA Guanine Quadruplex in Bipartite NBPF Exons. Brain Sci 2022; 12:brainsci12070874. [PMID: 35884681 PMCID: PMC9313022 DOI: 10.3390/brainsci12070874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/23/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Olduvai protein domains (also known as DUF1220 or NBPF) have undergone the greatest human-specific increase in the copy number of any coding region in the genome. Their repeat number was strongly associated with the evolutionary expansion of brain volumes, neuron counts and cognitive abilities, as well as with disorders of the autistic spectrum. Nevertheless, the domain function and cellular mechanisms underlying the positive selection of Olduvai DNA sequences in higher primates remain obscure. Here, I show that the inclusion of Olduvai exon doublets in mature transcripts is facilitated by a potent splicing enhancer that was created through duplication within the first exon. The enhancer is the strongest among the NBPF transcripts and further promotes the already high splicing activity of the unexpanded first exons of the two-exon domains, safeguarding the expanded Olduvai exon doublets in the mature transcriptome. The duplication also creates a predicted RNA guanine quadruplex that may regulate the access to spliceosomal components of the super-enhancer and influence the splicing of adjacent exons. Thus, positive Olduvai selection during primate evolution is likely to result from a combination of multiple targets in gene expression pathways, including RNA splicing.
Collapse
Affiliation(s)
- Igor Vořechovský
- Faculty of Medicine, University of Southampton, HDH, MP808, Southampton SO16 6YD, UK
| |
Collapse
|
5
|
Zhu L, Su X. Case Report: Neuroblastoma Breakpoint Family Genes Associate With 1q21 Copy Number Variation Disorders. Front Genet 2021; 12:728816. [PMID: 34646304 PMCID: PMC8504801 DOI: 10.3389/fgene.2021.728816] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 08/26/2021] [Indexed: 11/30/2022] Open
Abstract
Microduplications and reciprocal microdeletions of chromosome 1q21. 1 and/or 1q21.2 have been linked to variable clinical features, but the underlying pathogenic gene(s) remain unclear. Here we report that distinct microduplications were detected on chromosome 1q21.2 (GRCh37/hg19) in a mother (255 kb in size) and her newborn daughter (443 kb in size), while the same paternal locus was wild-type. Although the two microduplications largely overlap in genomic sequence (183 kb overlapping), the mother showed no clinical phenotype while the daughter presented with several features that are commonly observed on 1q21 microduplication or microdeletion patients, including developmental delay, craniofacial dysmorphism, congenital heart disease and sensorineural hearing loss. NBPF15 and NBPF16, two involved genes that are exclusively duplicated in the proband, may be the cause of the clinical manifestations. This study supports an association between NBPF genes and 1q21 copy number variation disorders.
Collapse
Affiliation(s)
- Lijuan Zhu
- Children's Hospital of Fudan University Anhui Hospital, Hefei, China
| | - Xiaoji Su
- Children's Hospital of Fudan University Anhui Hospital, Hefei, China
| |
Collapse
|
6
|
Uppuluri L, Jadhav T, Wang Y, Xiao M. Multicolor Whole-Genome Mapping in Nanochannels for Genetic Analysis. Anal Chem 2021; 93:9808-9816. [PMID: 34232611 DOI: 10.1021/acs.analchem.1c01373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Analysis of structural variations (SVs) is important to understand mutations underlying genetic disorders and pathogenic conditions. However, characterizing SVs using short-read, high-throughput sequencing technology is difficult. Although long-read sequencing technologies are being increasingly employed in characterizing SVs, their low throughput and high costs discourage widespread adoption. Sequence motif-based optical mapping in nanochannels is useful in whole-genome mapping and SV detection, but it is not possible to precisely locate the breakpoints or estimate the copy numbers. We present here a universal multicolor mapping strategy in nanochannels combining conventional sequence-motif labeling system with Cas9-mediated target-specific labeling of any 20-base sequences (20mers) to create custom labels and detect new features. The sequence motifs are labeled with green fluorophores and the 20mers are labeled with red fluorophores. Using this strategy, it is possible to not only detect the SVs but also utilize custom labels to interrogate the features not accessible to motif-labeling, locate breakpoints, and precisely estimate copy numbers of genomic repeats. We validated our approach by quantifying the D4Z4 copy numbers, a known biomarker for facioscapulohumeral muscular dystrophy (FSHD) and estimating the telomere length, a clinical biomarker for assessing disease risk factors in aging-related diseases and malignant cancers. We also demonstrate the application of our methodology in discovering transposable long non-interspersed Elements 1 (LINE-1) insertions across the whole genome.
Collapse
Affiliation(s)
- Lahari Uppuluri
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Tanaya Jadhav
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Yilin Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States.,Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
7
|
Mora-Bermúdez F, Taverna E, Huttner WB. From stem and progenitor cells to neurons in the developing neocortex: key differences among hominids. FEBS J 2021; 289:1524-1535. [PMID: 33638923 DOI: 10.1111/febs.15793] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/19/2021] [Accepted: 02/25/2021] [Indexed: 01/05/2023]
Abstract
Comparing the biology of humans to that of other primates, and notably other hominids, is a useful path to learn more about what makes us human. Some of the most interesting differences among hominids are closely related to brain development and function, for example behaviour and cognition. This makes it particularly interesting to compare the hominid neural cells of the neocortex, a part of the brain that plays central roles in those processes. However, well-preserved tissue from great apes is usually extremely difficult to obtain. A variety of new alternative tools, for example brain organoids, are now beginning to make it possible to search for such differences and analyse their potential biological and biomedical meaning. Here, we present an overview of recent findings from comparisons of the neural stem and progenitor cells (NSPCs) and neurons of hominids. In addition to differences in proliferation and differentiation of NSPCs, and maturation of neurons, we highlight that the regulation of the timing of these processes is emerging as a general foundational difference in the development of the neocortex of hominids.
Collapse
Affiliation(s)
- Felipe Mora-Bermúdez
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Elena Taverna
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Wieland B Huttner
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| |
Collapse
|