1
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Cascade Alpha Satellite HORs in Orangutan Chromosome 13 Assembly: Discovery of the 59mer HOR-The largest Unit in Primates-And the Missing Triplet 45/27/18 HOR in Human T2T-CHM13v2.0 Assembly. Int J Mol Sci 2024; 25:7596. [PMID: 39062839 PMCID: PMC11276891 DOI: 10.3390/ijms25147596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/05/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
From the recent genome assembly NHGRI_mPonAbe1-v2.0_NCBI (GCF_028885655.2) of orangutan chromosome 13, we computed the precise alpha satellite higher-order repeat (HOR) structure using the novel high-precision GRM2023 algorithm with Global Repeat Map (GRM) and Monomer Distance (MD) diagrams. This study rigorously identified alpha satellite HORs in the centromere of orangutan chromosome 13, discovering a novel 59mer HOR-the longest HOR unit identified in any primate to date. Additionally, it revealed the first intertwined sequence of three HORs, 18mer/27mer/45mer HORs, with a common aligned "backbone" across all HOR copies. The major 7mer HOR exhibits a Willard's-type canonical copy, although some segments of the array display significant irregularities. In contrast, the 14mer HOR forms a regular Willard's-type HOR array. Surprisingly, the GRM2023 high-precision analysis of chromosome 13 of human genome assembly T2T-CHM13v2.0 reveals the presence of only a 7mer HOR, despite both the orangutan and human genome assemblies being derived from whole genome shotgun sequences.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Department of Interdisciplinary Sciences, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
2
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Precise identification of cascading alpha satellite higher order repeats in T2T-CHM13 assembly of human chromosome 3. Croat Med J 2024; 65:209-219. [PMID: 38868967 PMCID: PMC11157248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
AIM To precisely identify and analyze alpha-satellite higher-order repeats (HORs) in T2T-CHM13 assembly of human chromosome 3. METHODS From the recently sequenced complete T2T-CHM13 assembly of human chromosome 3, the precise alpha satellite HOR structure was computed by using the novel high-precision GRM2023 algorithm with global repeat map (GRM) and monomer distance (MD) diagrams. RESULTS The major alpha satellite HOR array in chromosome 3 revealed a novel cascading HOR, housing 17mer HOR copies with subfragments of periods 15 and 2. Within each row in the cascading HOR, the monomers were of different types, but different rows within the same cascading 17mer HOR contained more than one monomer of the same type. Each canonical 17mer HOR copy comprised 17 monomers belonging to 16 different monomer types. Another pronounced 10mer HOR array was of the regular Willard's type. CONCLUSION Our findings emphasize the complexity within the chromosome 3 centromere as well as deviations from expected highly regular patterns.
Collapse
Affiliation(s)
- Matko Glunčić
- Matko Glunčić, Department of Physics, Faculty of Science, University of Zagreb, Bijenička cesta 32, 10000 Zagreb, Croatia,
| | | | | | | |
Collapse
|
3
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
4
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Tandem NBPF 3mer HORs (Olduvai triplets) in Neanderthal and two novel HOR tandem arrays in human chromosome 1 T2T-CHM13 assembly. Sci Rep 2023; 13:14420. [PMID: 37660151 PMCID: PMC10475015 DOI: 10.1038/s41598-023-41517-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/28/2023] [Indexed: 09/04/2023] Open
Abstract
It is known that the ~ 1.6 kb Neuroblastoma BreakPoint Family (NBPF) repeats are human specific and contributing to cognitive capabilities, with increasing frequency in higher order repeat 3mer HORs (Olduvai triplets). From chimpanzee to modern human there is a discontinuous jump from 0 to ~ 50 tandemly organized 3mer HORs. Here we investigate the structure of NBPF 3mer HORs in the Neanderthal genome assembly of Pääbo et al., comparing it to the results obtained for human hg38.p14 chromosome 1. Our findings reveal corresponding NBPF 3mer HOR arrays in Neanderthals with slightly different monomer structures and numbers of HOR copies compared to humans. Additionally, we compute the NBPF 3mer HOR pattern for the complete telomere-to-telomere human genome assembly (T2T-CHM13) by Miga et al., identifying two novel tandem arrays of NBPF 3mer HOR repeats with 5 and 9 NBPF 3mer HOR copies. We hypothesize that these arrays correspond to novel NBPF genes (here referred to as NBPFA1 and NBPFA2). Further improving the quality of the Neanderthal genome using T2T-CHM13 as a reference would be of great interest in determining the presence of such distant novel NBPF genes in the Neanderthal genome and enhancing our understanding of human evolution.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000, Zagreb, Croatia.
| | | | - Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000, Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000, Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000, Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000, Zagreb, Croatia
| |
Collapse
|
5
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Tandemly repeated NBPF HOR copies (Olduvai triplets): Possible impact on human brain evolution. Life Sci Alliance 2022; 6:6/1/e202101306. [PMID: 36261226 PMCID: PMC9584774 DOI: 10.26508/lsa.202101306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 11/24/2022] Open
Abstract
Previously it was found that the neuroblastoma breakpoint family (NBPF) gene repeat units of ∼1.6 kb have an important role in human brain evolution and function. The higher order organization of these repeat units has been discovered by both methods, the higher order repeat (HOR)-searching method and the HLS searching method. Using the HOR searching method with global repeat map algorithm, here we identified the tandemly organized NBPF HORs in the human and nonhuman primate NCBI reference genomes. We identified 50 tandemly organized canonical 3mer NBPF HOR copies (Olduvai triplets), but none in nonhuman primates chimpanzee, gorilla, orangutan, and Rhesus macaque. This discontinuous jump in tandemly organized HOR copy number is in sharp contrast to the known gradual increase in the number of Olduvai domains (NBPF monomers) from nonhuman primates to human, especially from ∼138 in chimpanzee to ∼300 in human genome. Using the same global repeat map algorithm method we have also determined the 3mer tandems of canonical 3mer HOR copies in 20 randomly chosen human genomes (10 male and 10 female). In all cases, we found the same 3mer HOR copy numbers as in the case of the reference human genome, with no mutation. On the other hand, some point mutations with respect to reference genome are found for some NBPF monomers which are not tandemly organized in canonical HORs.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, Zagreb, Croatia
| | | | - Marija Rosandić
- University Hospital Centre Zagreb (ret), Zagreb, Croatia,Croatian Academy of Sciences and Arts, Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, Zagreb, Croatia,Croatian Academy of Sciences and Arts, Zagreb, Croatia
| |
Collapse
|
6
|
Machado JAT, Rocha-Neves JM, Azevedo F, Andrade JP. Advances in the computational analysis of SARS-COV2 genome. NONLINEAR DYNAMICS 2021; 106:1525-1555. [PMID: 34465942 PMCID: PMC8391012 DOI: 10.1007/s11071-021-06836-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 08/15/2021] [Indexed: 06/13/2023]
Abstract
Given a data-set of Ribonucleic acid (RNA) sequences we can infer the phylogenetics of the samples and tackle the information for scientific purposes. Based on current data and knowledge, the SARS-CoV-2 seemingly mutates much more slowly than the influenza virus that causes seasonal flu. However, very recent evolution poses some doubts about such conjecture and shadows the out-coming light of people vaccination. This paper adopts mathematical and computational tools for handling the challenge of analyzing the data-set of different clades of the severe acute respiratory syndrome virus-2 (SARS-CoV-2). On one hand, based on the mathematical paraphernalia of tools, the concept of distance associated with the Kolmogorov complexity and Shannon information theories, as well as with the Hamming scheme, are considered. On the other, advanced data processing computational techniques, such as, data compression, clustering and visualization, are borrowed for tackling the problem. The results of the synergistic approach reveal the complex time dynamics of the evolutionary process and may help to clarify future directions of the SARS-CoV-2 evolution.
Collapse
Affiliation(s)
- J. A. Tenreiro Machado
- Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 431, 4249 – 015 Porto, Portugal
| | - J. M. Rocha-Neves
- Department of Biomedicine – Unity of Anatomy, and Department of Physiology and Surgery, Faculty of Medicine of University of Porto, Porto, Portugal
| | - Filipe Azevedo
- Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 431, 4249 – 015 Porto, Portugal
| | - J. P. Andrade
- Department of Biomedicine – Unity of Anatomy, Faculty of Medicine of University of Porto and Center for Health Technology and Services Research (CINTESIS), Porto, Portugal
| |
Collapse
|
7
|
Easterling KA, Pitra NJ, Morcol TB, Aquino JR, Lopes LG, Bussey KC, Matthews PD, Bass HW. Identification of tandem repeat families from long-read sequences of Humulus lupulus. PLoS One 2020; 15:e0233971. [PMID: 32502183 PMCID: PMC7274563 DOI: 10.1371/journal.pone.0233971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 05/16/2020] [Indexed: 11/28/2022] Open
Abstract
Hop (Humulus lupulus L.) is known for its use as a bittering agent in beer and has a rich history of cultivation, beginning in Europe and now spanning the globe. There are five wild varieties worldwide, which may have been introgressed with cultivated varieties. As a dioecious species, its obligate outcrossing, non-Mendelian inheritance, and genomic structural variability have confounded directed breeding efforts. Consequently, understanding the hop genome represents a considerable challenge, requiring additional resources. In order to facilitate investigations into the transmission genetics of hop, we report here a tandem repeat discovery pipeline developed using k-mer filtering and dot plot analysis of PacBio long-read sequences from the hop cultivar Apollo. From this we identified 17 new and distinct tandem repeat sequence families, which represent candidates for FISH probe development. For two of these candidates, HuluTR120 and HuluTR225, we produced oligonucleotide FISH probes from conserved regions of and demonstrated their utility by staining meiotic chromosomes from wild hop, var. neomexicanus to address, for example, questions about hop transmission genetics. Collectively, these tandem repeat sequence families represent new resources suitable for development of additional cytogenomic tools for hop research.
Collapse
Affiliation(s)
- Katherine A. Easterling
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Nicholi J. Pitra
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Taylan B. Morcol
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
- Department of Biological Sciences, Lehman College, City University of New York, Bronx, New York, United States America
- The Graduate Center, City University of New York, New York, New York, United States America
| | - Jenna R. Aquino
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Lauren G. Lopes
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Kristin C. Bussey
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Paul D. Matthews
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Hank W. Bass
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| |
Collapse
|
8
|
Ikonomou L, Herriges MJ, Lewandowski SL, Marsland R, Villacorta-Martin C, Caballero IS, Frank DB, Sanghrajka RM, Dame K, Kańduła MM, Hicks-Berthet J, Lawton ML, Christodoulou C, Fabian AJ, Kolaczyk E, Varelas X, Morrisey EE, Shannon JM, Mehta P, Kotton DN. The in vivo genetic program of murine primordial lung epithelial progenitors. Nat Commun 2020; 11:635. [PMID: 32005814 PMCID: PMC6994558 DOI: 10.1038/s41467-020-14348-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 12/23/2019] [Indexed: 12/29/2022] Open
Abstract
Multipotent Nkx2-1-positive lung epithelial primordial progenitors of the foregut endoderm are thought to be the developmental precursors to all adult lung epithelial lineages. However, little is known about the global transcriptomic programs or gene networks that regulate these gateway progenitors in vivo. Here we use bulk RNA-sequencing to describe the unique genetic program of in vivo murine lung primordial progenitors and computationally identify signaling pathways, such as Wnt and Tgf-β superfamily pathways, that are involved in their cell-fate determination from pre-specified embryonic foregut. We integrate this information in computational models to generate in vitro engineered lung primordial progenitors from mouse pluripotent stem cells, improving the fidelity of the resulting cells through unbiased, easy-to-interpret similarity scores and modulation of cell culture conditions, including substratum elastic modulus and extracellular matrix composition. The methodology proposed here can have wide applicability to the in vitro derivation of bona fide tissue progenitors of all germ layers.
Collapse
Affiliation(s)
- Laertis Ikonomou
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA.
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA.
| | - Michael J Herriges
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Sara L Lewandowski
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Robert Marsland
- Department of Physics, Boston University, Boston, MA, 02215, USA
| | - Carlos Villacorta-Martin
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
| | - Ignacio S Caballero
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
| | - David B Frank
- Division of Pediatric Cardiology, Department of Pediatrics, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Reeti M Sanghrajka
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Keri Dame
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Maciej M Kańduła
- Department of Mathematics & Statistics, Boston University, Boston, MA, 02215, USA
- Chair of Bioinformatics Research Group, Boku University, 1190, Vienna, Austria
| | - Julia Hicks-Berthet
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Matthew L Lawton
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Constantina Christodoulou
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
| | | | - Eric Kolaczyk
- Department of Mathematics & Statistics, Boston University, Boston, MA, 02215, USA
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Edward E Morrisey
- Penn Center for Pulmonary Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - John M Shannon
- Division of Pulmonary Biology, Cincinnati Children's Hospital, Cincinnati, OH, 45229, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA, 02215, USA
| | - Darrell N Kotton
- Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, 02118, USA.
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA.
| |
Collapse
|
9
|
Machado JAT, Rocha-Neves JM, Andrade JP. Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories. NONLINEAR DYNAMICS 2020; 101:1731-1750. [PMID: 32836811 PMCID: PMC7335223 DOI: 10.1007/s11071-020-05771-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 06/14/2020] [Indexed: 05/06/2023]
Abstract
This paper tackles the information of 133 RNA viruses available in public databases under the light of several mathematical and computational tools. First, the formal concepts of distance metrics, Kolmogorov complexity and Shannon information are recalled. Second, the computational tools available presently for tackling and visualizing patterns embedded in datasets, such as the hierarchical clustering and the multidimensional scaling, are discussed. The synergies of the common application of the mathematical and computational resources are then used for exploring the RNA data, cross-evaluating the normalized compression distance, entropy and Jensen-Shannon divergence, versus representations in two and three dimensions. The results of these different perspectives give extra light in what concerns the relations between the distinct RNA viruses.
Collapse
Affiliation(s)
- J. A. Tenreiro Machado
- Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 431, 4249-015 Porto, Portugal
| | - João M. Rocha-Neves
- Department of Biomedicine – Unity of Anatomy, Faculty of Medicine of University of Porto, Porto, Portugal
- Department of Physiology and Surgery, Faculty of Medicine of University of Porto, Porto, Portugal
| | - José P. Andrade
- Department of Biomedicine – Unity of Anatomy, Faculty of Medicine of University of Porto, Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Porto, Portugal
| |
Collapse
|
10
|
Das A, Nigam D, Junaid A, Tribhuvan KU, Kumar K, Durgesh K, Singh NK, Gaikwad K. Expressivity of the key genes associated with seed and pod development is highly regulated via lncRNAs and miRNAs in Pigeonpea. Sci Rep 2019; 9:18191. [PMID: 31796783 PMCID: PMC6890743 DOI: 10.1038/s41598-019-54340-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/21/2019] [Indexed: 12/18/2022] Open
Abstract
Non-coding RNA’s like miRNA, lncRNA, have gained immense importance as a significant regulatory factor in different physiological and developmental processes in plants. In an effort to understand the molecular role of these regulatory agents, in the present study, 3019 lncRNAs and 227 miRNAs were identified from different seed and pod developmental stages in Pigeonpea, a major grain legume of Southeast Asia and Africa. Target analysis revealed that 3768 mRNAs, including 83 TFs were targeted by lncRNAs; whereas 3060 mRNA, including 154 TFs, were targeted by miRNAs. The targeted transcription factors majorly belong to WRKY, MYB, bHLH, etc. families; whereas the targeted genes were associated with the embryo, seed, and flower development. Total 302 lncRNAs interact with miRNAs and formed endogenous target mimics (eTMs) which leads to sequestering of the miRNAs present in the cell. Expression analysis showed that notably, Cc_lncRNA-2830 expression is up-regulated and sequestrates miR160h in pod leading to higher expression of the miR160h target gene, Auxin responsive factor-18. A similar pattern was observed for SPIKE, Auxin signaling F-box-2, Bidirectional sugar transporter, and Starch synthetase-2 eTMs. All the identified target mRNAs code for transcription factor and genes are involved in the processes like cell division, plant growth and development, starch synthesis, sugar transportation and accumulation of storage proteins which are essential for seed and pod development. On a combinatorial basis, our study provides a lncRNA and miRNA based regulatory insight into the genes governing seed and pod development in Pigeonpea.
Collapse
Affiliation(s)
- Antara Das
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India
| | - Deepti Nigam
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India
| | - Alim Junaid
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India
| | | | - Kuldeep Kumar
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India
| | | | - N K Singh
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India
| | - Kishor Gaikwad
- ICAR- National Research Centre on Plant Biotechnology, New Delhi, India.
| |
Collapse
|
11
|
Discovery of 33mer in chromosome 21 - the largest alpha satellite higher order repeat unit among all human somatic chromosomes. Sci Rep 2019; 9:12629. [PMID: 31477765 PMCID: PMC6718397 DOI: 10.1038/s41598-019-49022-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 08/13/2019] [Indexed: 11/10/2022] Open
Abstract
The centromere is important for segregation of chromosomes during cell division in eukaryotes. Its destabilization results in chromosomal missegregation, aneuploidy, hallmarks of cancers and birth defects. In primate genomes centromeres contain tandem repeats of ~171 bp alpha satellite DNA, commonly organized into higher order repeats (HORs). In spite of crucial importance, satellites have been understudied because of gaps in sequencing - genomic “black holes”. Bioinformatical studies of genomic sequences open possibilities to revolutionize understanding of repetitive DNA datasets. Here, using robust (Global Repeat Map) algorithm we identified in hg38 sequence of human chromosome 21 complete ensemble of alpha satellite HORs with six long repeat units (≥20 mers), five of them novel. Novel 33mer HOR has the longest HOR unit identified so far among all somatic chromosomes and novel 23mer reverse HOR is distant far from the centromere. Also, we discovered that for hg38 assembly the 33mer sequences in chromosomes 21, 13, 14, and 22 are 100% identical but nearby gaps are present; that seems to require an additional more precise sequencing. Chromosome 21 is of significant interest for deciphering the molecular base of Down syndrome and of aneuploidies in general. Since the chromosome identifier probes are largely based on the detection of higher order alpha satellite repeats, distinctions between alpha satellite HORs in chromosomes 21 and 13 here identified might lead to a unique chromosome 21 probe in molecular cytogenetics, which would find utility in diagnostics. It is expected that its complete sequence analysis will have profound implications for understanding pathogenesis of diseases and development of new therapeutic approaches.
Collapse
|
12
|
Vlahovic I, Gluncic M, Rosandic M, Ugarkovic Ð, Paar V. Regular Higher Order Repeat Structures in Beetle Tribolium castaneum Genome. Genome Biol Evol 2018; 9:2668-2680. [PMID: 27492235 PMCID: PMC5737470 DOI: 10.1093/gbe/evw174] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2016] [Indexed: 02/07/2023] Open
Abstract
Higher order repeats (HORs) containing tandems of primary and secondary repeat units (head-to-tail “tandem within tandem pattern”), referred to as regular HORs, are typical for primate alpha satellite DNAs and most pronounced in human genome. Regular HORs are known to be a result of recent evolutionary processes. In non-primate genomes mostly so called complex HORs have been found, without head to tail tandem of primary repeat units. In beetle Tribolium castaneum, considered as a model case for genome studies, large tandem repeats have been identified, but no HORs have been reported. Here, using our novel robust repeat finding algorithm Global Repeat Map, we discover two regular and six complex HORs in T. castaneum. In organizational pattern, the integrity and homogeneity of regular HORs in T. castaneum resemble human regular HORs (with T. castaneum monomers different from human alpha satellite monomers), involving a wider range of monomer lengths than in human HORs. Similar regular higher order repeat structures have previously not been found in insects. Some of these novel HORs in T. castaneum appear as most regular among known HORs in non-primate genomes, although with substantial riddling. This is intriguing, in particular from the point of view of role of non-coding repeats in modulation of gene expression.
Collapse
Affiliation(s)
- Ines Vlahovic
- Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Matko Gluncic
- Faculty of Science, University of Zagreb, Zagreb, Croatia
| | | | | | - Vladimir Paar
- Faculty of Science, University of Zagreb, Zagreb, Croatia.,Croatian Academy of Sciences and Arts, Zagreb, Croatia
| |
Collapse
|
13
|
Lower SS, McGurk MP, Clark AG, Barbash DA. Satellite DNA evolution: old ideas, new approaches. Curr Opin Genet Dev 2018; 49:70-78. [PMID: 29579574 PMCID: PMC5975084 DOI: 10.1016/j.gde.2018.03.003] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 02/02/2018] [Accepted: 03/08/2018] [Indexed: 12/22/2022]
Abstract
A substantial portion of the genomes of most multicellular eukaryotes consists of large arrays of tandemly repeated sequence, collectively called satellite DNA. The processes generating and maintaining different satellite DNA abundances across lineages are important to understand as satellites have been linked to chromosome mis-segregation, disease phenotypes, and reproductive isolation between species. While much theory has been developed to describe satellite evolution, empirical tests of these models have fallen short because of the challenges in assessing satellite repeat regions of the genome. Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide. Here, we describe some of these tools and how their applications are furthering our knowledge of satellite evolution and function.
Collapse
Affiliation(s)
- Sarah Sander Lower
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Michael P McGurk
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States.
| |
Collapse
|
14
|
Redenšek S, Dolžan V, Kunej T. From Genomics to Omics Landscapes of Parkinson's Disease: Revealing the Molecular Mechanisms. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2018; 22:1-16. [PMID: 29356624 PMCID: PMC5784788 DOI: 10.1089/omi.2017.0181] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Molecular mechanisms of Parkinson's disease (PD) have already been investigated in various different omics landscapes. We reviewed the literature about different omics approaches between November 2005 and November 2017 to depict the main pathological pathways for PD development. In total, 107 articles exploring different layers of omics data associated with PD were retrieved. The studies were grouped into 13 omics layers: genomics-DNA level, transcriptomics, epigenomics, proteomics, ncRNomics, interactomics, metabolomics, glycomics, lipidomics, phenomics, environmental omics, pharmacogenomics, and integromics. We discussed characteristics of studies from different landscapes, such as main findings, number of participants, sample type, methodology, and outcome. We also performed curation and preliminary synthesis of multiple omics data, and identified overlapping results, which could lead toward selection of biomarkers for further validation of PD risk loci. Biomarkers could support the development of targeted prognostic/diagnostic panels as a tool for early diagnosis and prediction of progression rate and prognosis. This review presents an example of a comprehensive approach to revealing the underlying processes and risk factors of a complex disease. It urges scientists to structure the already known data and integrate it into a meaningful context.
Collapse
Affiliation(s)
- Sara Redenšek
- Pharmacogenetics Laboratory, Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Vita Dolžan
- Pharmacogenetics Laboratory, Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
15
|
Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications. Synth Biol (Oxf) 2018. [DOI: 10.1007/978-981-10-8693-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
16
|
Novák P, Ávila Robledillo L, Koblížková A, Vrbová I, Neumann P, Macas J. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 2017. [PMID: 28402514 DOI: 10.1093/nar/gkx257.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
Collapse
Affiliation(s)
- Petr Novák
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Laura Ávila Robledillo
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Andrea Koblížková
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Iva Vrbová
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Pavel Neumann
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Jirí Macas
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| |
Collapse
|
17
|
Novák P, Ávila Robledillo L, Koblížková A, Vrbová I, Neumann P, Macas J. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 2017; 45:e111. [PMID: 28402514 PMCID: PMC5499541 DOI: 10.1093/nar/gkx257] [Citation(s) in RCA: 174] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 03/23/2017] [Accepted: 04/04/2017] [Indexed: 12/21/2022] Open
Abstract
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
Collapse
Affiliation(s)
- Petr Novák
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Laura Ávila Robledillo
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Andrea Koblížková
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Iva Vrbová
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Pavel Neumann
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Jirí Macas
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| |
Collapse
|
18
|
Yin C, Wang J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J Math Biol 2016; 73:1053-1079. [PMID: 26942584 DOI: 10.1007/s00285-016-0982-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 02/19/2016] [Indexed: 12/27/2022]
Abstract
Periodic elements play important roles in genomic structures and functions, yet some complex periodic elements in genomes are difficult to detect by conventional methods such as digital signal processing and statistical analysis. We propose a periodic power spectrum (PPS) method for analyzing periodicities of DNA sequences. The PPS method employs periodic nucleotide distributions of DNA sequences and directly calculates power spectra at specific periodicities. The magnitude of a PPS reflects the strength of a signal on periodic positions. In comparison with Fourier transform, the PPS method avoids spectral leakage, and reduces background noise that appears high in Fourier power spectrum. Thus, the PPS method can effectively capture hidden periodicities in DNA sequences. Using a sliding window approach, the PPS method can precisely locate periodic regions in DNA sequences. We apply the PPS method for detection of hidden periodicities in different genome elements, including exons, microsatellite DNA sequences, and whole genomes. The results show that the PPS method can minimize the impact of spectral leakage and thus capture true hidden periodicities in genomes. In addition, performance tests indicate that the PPS method is more effective and efficient than a fast Fourier transform. The computational complexity of the PPS algorithm is [Formula: see text]. Therefore, the PPS method may have a broad range of applications in genomic analysis. The MATLAB programs for implementing the PPS method are available from MATLAB Central ( http://www.mathworks.com/matlabcentral/fileexchange/55298 ).
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL, 60607-7045, USA.
| | - Jiasong Wang
- Department of Mathematics, Nanjing University, Nanjing, Jiangsu, 210093, China
| |
Collapse
|
19
|
Chaley M, Kutyrkin V. Spectral-Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. Methods Mol Biol 2016; 1415:315-340. [PMID: 27115640 DOI: 10.1007/978-1-4939-3572-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.
Collapse
Affiliation(s)
- Maria Chaley
- Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st., 4, 142290, Pushchino, Russia.
| | - Vladimir Kutyrkin
- Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University, n.a. N.E. Bauman the 2nd Baumanskaya st., 5, 105005, Moscow, Russia
| |
Collapse
|
20
|
Yin C. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction. J Bioinform Comput Biol 2014; 13:1550004. [PMID: 25491390 DOI: 10.1142/s0219720015500043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, IL 60607-7045, USA
| |
Collapse
|
21
|
Rosandić M, Paar V, Glunčić M. Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification. Gene 2013; 531:184-90. [PMID: 24042127 DOI: 10.1016/j.gene.2013.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 08/31/2013] [Accepted: 09/05/2013] [Indexed: 10/26/2022]
Abstract
The origin and logic of genetic code are two of greatest mysteries of life sciences. Analyzing DNA sequences we showed that the start/stop trinucleotides have broader importance than just marking start and stop of exons in coding DNA. On this basis, here we introduced new classification of trinucleotides and showed that all A+T rich trinucleotides consisting of three different nucleotides arise from start-ATG, stop-TGA and stop-TAG using their complement, reverse complement and reverse transformations. Due to the same transformations during generations of crossing-over they can switch from one form to the other. By direct process the start-ATG and stop-TAG can irreversibly transform into stop-TAA. By transformation into A+T rich trinucleotides and 16/32 C+G rich they can lose the start/stop function and take the role of a sense codon in reversible way. The remaining 16 C+G trinucleotides cannot directly transform into start/stop trinucleotides and thus remain a firm skeleton for structuring the C+G rich DNA. We showed that start/stops strongly enrich the A+T rich noncoding DNA through frequently extended forms. From the evolutionary viewpoint the start/stops are chief creators of prevailing A+T rich noncoding DNA, and of more stable coding DNA. We propose that start/stops have basic role as "seeds" in trinucleotide evolution of noncoding and coding sequences and lead to asymmetry between A+T and C+G rich DNA. By dynamical transformations during evolution they enabled pronounced phylogenetic broadness, keeping the regulator function.
Collapse
Affiliation(s)
- Marija Rosandić
- Faculty of Science, University of Zagreb, Bijenička 32, 10000 Zagreb, Croatia
| | | | | |
Collapse
|