1
|
Parl FF. Analysis of CENP-B Boxes as Anchor of Kinetochores in Centromeres of Human Chromosomes. Bioinform Biol Insights 2024; 18:11779322241248913. [PMID: 38690324 PMCID: PMC11060027 DOI: 10.1177/11779322241248913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
The kinetochore is a multiprotein structure that attaches at one end to DNA in the centromere and at the other end to microtubules in the mitotic spindle. By connecting centromere and spindle, the kinetochore controls the migration of chromosomes during cell division. The exact position where the kinetochore assembles on each centromere was uncertain because large sections of centromeric DNA had not been sequenced due to highly repetitive alpha-satellite arrays. Embedded in the arrays is a 17 bp consensus sequence, the so-called CENP-B box, which binds the CENP-B protein, the only protein that binds directly to centromeric DNA. Recently, the Telomere-to-Telomere Consortium published the complete centromeric DNA sequences of all chromosomes including their epigenetic modifications in the T2T-CHM13 map. I used data from the T2T-CHM13 map to locate the CENP-B boxes in the centromeres as anchor of kinetochores. Most of the CENP-B boxes in centromeric DNA are methylated with the exception of the so-called centromere dip region (CDR), where CENP-B protein dimers bind to adjacent unmethylated CENP-B boxes and interact with CENP-A and CENP-C proteins to assemble the kinetochore. The centromeres of all chromosomes combined have a size of 407 Mb of which the kinetochores account for 5.0 Mb or 1.2%. There is no correlation between centromere and kinetochore size (P = .77). While the number of CENP-B boxes varies 4-fold between chromosomes, their density (number/Kb) varies less than 2-fold with a mean of 2.61 ± 0.33. The narrow range ensures a uniform pull of the spindle on the centromeres. I illustrate the findings in a model of the human kinetochore anchored at unmethylated CENP-B boxes in the CDR and present circos plots of chromosomes to show the location of kinetochores in their respective centromeres.
Collapse
Affiliation(s)
- Fritz F Parl
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
2
|
Helal AA, Saad BT, Saad MT, Mosaad GS, Aboshanab KM. Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data. Sci Rep 2024; 14:6160. [PMID: 38486064 PMCID: PMC10940726 DOI: 10.1038/s41598-024-56604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Structural variants (SVs) are one of the significant types of DNA mutations and are typically defined as larger-than-50-bp genomic alterations that include insertions, deletions, duplications, inversions, and translocations. These modifications can profoundly impact the phenotypic characteristics and contribute to disorders like cancer, response to treatment, and infections. Four long-read aligners and five SV callers have been evaluated using three Oxford Nanopore NGS human genome datasets in terms of precision, recall, and F1-score statistical metrics, depth of coverage, and speed of analysis. The best SV caller regarding recall, precision, and F1-score when matched with different aligners at different coverage levels tend to vary depending on the dataset and the specific SV types being analyzed. However, based on our findings, Sniffles and CuteSV tend to perform well across different aligners and coverage levels, followed by SVIM, PBSV, and SVDSS in the last place. The CuteSV caller has the highest average F1-score (82.51%) and recall (78.50%), and Sniffles has the highest average precision value (94.33%). Minimap2 as an aligner and Sniffles as an SV caller act as a strong base for the pipeline of SV calling because of their high speed and reasonable accomplishment. PBSV has a lower average F1-score, precision, and recall and may generate more false positives and overlook some actual SVs. Our results are valuable in the comprehensive evaluation of popular SV callers and aligners as they provide insight into the performance of several long-read aligners and SV callers and serve as a reference for researchers in selecting the most suitable tools for SV detection.
Collapse
Affiliation(s)
- Asmaa A Helal
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Bishoy T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt.
| | - Mina T Saad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Gamal S Mosaad
- Department of Bioinformatics, HITS Solutions Co., Cairo, 11765, Egypt
| | - Khaled M Aboshanab
- Department of Microbiology and Immunology, Faculty of Pharmacy, Ain Shams University, Organization of African Unity St., Abassi, Cairo, 11566, Egypt.
| |
Collapse
|
3
|
Lee DH, Bae WH, Ha H, Kim WR, Park EG, Lee YJ, Kim JM, Shin HJ, Kim HS. The human PTGR1 gene expression is controlled by TE-derived Z-DNA forming sequence cooperating with miR-6867-5p. Sci Rep 2024; 14:4723. [PMID: 38413664 PMCID: PMC10899170 DOI: 10.1038/s41598-024-55332-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/22/2024] [Indexed: 02/29/2024] Open
Abstract
Z-DNA, a well-known non-canonical form of DNA involved in gene regulation, is often found in gene promoters. Transposable elements (TEs), which make up 45% of the human genome, can move from one location to another within the genome. TEs play various biological roles in host organisms, and like Z-DNA, can influence transcriptional regulation near promoter regions. MicroRNAs (miRNAs) are a class of small non-coding RNA molecules that play a critical role in the regulation of gene expression. Although TEs can generate Z-DNA and miRNAs can bind to Z-DNA, how these factors affect gene transcription has yet to be elucidated. Here, we identified potential Z-DNA forming sequence (ZFS), including TE-derived ZFS, in the promoter of prostaglandin reductase 1 (PTGR1) by data analysis. The transcriptional activity of these ZFS in PTGR1 was confirmed using dual-luciferase reporter assays. In addition, we discovered a novel ZFS-binding miRNA (miR-6867-5p) that suppressed PTGR1 expression by targeting to ZFS. In conclusion, these findings suggest that ZFS, including TE-derived ZFS, can regulate PTGR1 gene expression and that miR-6867-5p can suppress PTGR1 by interacting with ZFS.
Collapse
Affiliation(s)
- Du Hyeong Lee
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Woo Hyeon Bae
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Hongseok Ha
- Institute of Endemic Diseases, College of Medicine, Seoul National University, Seoul, 03080, Republic of Korea
| | - Woo Ryung Kim
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Eun Gyung Park
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Yun Ju Lee
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Jung-Min Kim
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Hae Jin Shin
- Department of Integrated Biological Sciences, Pusan National University, Busan, 46241, Republic of Korea
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea
| | - Heui-Soo Kim
- Institute of Systems Biology, Pusan National University, Busan, 46241, Republic of Korea.
- Department of Biological Sciences, College of Natural Sciences, Pusan National University, Busan, 46241, Republic of Korea.
| |
Collapse
|
4
|
Falker-Gieske C. Transcriptome driven discovery of novel candidate genes for human neurological disorders in the telomer-to-telomer genome assembly era. Hum Genomics 2023; 17:94. [PMID: 37872607 PMCID: PMC10594789 DOI: 10.1186/s40246-023-00543-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/17/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND With the first complete draft of a human genome, the Telomere-to-Telomere Consortium unlocked previously concealed genomic regions for genetic analyses. These regions harbour nearly 2000 potential novel genes with unknown function. In order to uncover candidate genes associated with human neurological pathologies, a comparative transcriptome study using the T2T-CHM13 and the GRCh38 genome assemblies was conducted on previously published datasets for eight distinct human neurological disorders. RESULTS The analysis of differential expression in RNA sequencing data led to the identification of 336 novel candidate genes linked to human neurological disorders. Additionally, it was revealed that, on average, 3.6% of the differentially expressed genes detected with the GRCh38 assembly may represent potential false positives. Among the noteworthy findings, two novel genes were discovered, one encoding a pore-structured protein and the other a highly ordered β-strand-rich protein. These genes exhibited upregulation in multiple epilepsy datasets and hold promise as candidate genes potentially modulating the progression of the disease. Furthermore, an analysis of RNA derived from white matter lesions in multiple sclerosis patients indicated significant upregulation of 26 rRNA encoding genes. Additionally, putative pathology related genes were identified for Alzheimer's disease, amyotrophic lateral sclerosis, glioblastoma, glioma, and conditions resulting from the m.3242 A > G mtDNA mutation. CONCLUSION The results presented here underline the potential of the T2T-CHM13 assembly in facilitating the discovery of candidate genes from transcriptome data in the context of human disorders. Moreover, the results demonstrate the value of remapping sequencing data to a superior genome assembly. Numerous potential pathology related genes, either as causative factors or related elements, have been unveiled, warranting further experimental validation.
Collapse
Affiliation(s)
- Clemens Falker-Gieske
- Division of Functional Breeding, Department of Animal Sciences, Georg-August-Universität Göttingen, Burckhardtweg 2, 37077, Göttingen, Germany.
| |
Collapse
|
5
|
Carlberg C. Nutrigenomics in the context of evolution. Redox Biol 2023; 62:102656. [PMID: 36933390 PMCID: PMC10036735 DOI: 10.1016/j.redox.2023.102656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/13/2023] Open
Abstract
Nutrigenomics describes the interaction between nutrients and our genome. Since the origin of our species most of these nutrient-gene communication pathways have not changed. However, our genome experienced over the past 50,000 years a number of evolutionary pressures, which are based on the migration to new environments concerning geography and climate, the transition from hunter-gatherers to farmers including the zoonotic transfer of many pathogenic microbes and the rather recent change of societies to a preferentially sedentary lifestyle and the dominance of Western diet. Human populations responded to these challenges not only by specific anthropometric adaptations, such as skin color and body stature, but also through diversity in dietary intake and different resistance to complex diseases like the metabolic syndrome, cancer and immune disorders. The genetic basis of this adaptation process has been investigated by whole genome genotyping and sequencing including that of DNA extracted from ancient bones. In addition to genomic changes, also the programming of epigenomes in pre- and postnatal phases of life has an important contribution to the response to environmental changes. Thus, insight into the variation of our (epi)genome in the context of our individual's risk for developing complex diseases, helps to understand the evolutionary basis how and why we become ill. This review will discuss the relation of diet, modern environment and our (epi)genome including aspects of redox biology. This has numerous implications for the interpretation of the risks for disease and their prevention.
Collapse
Affiliation(s)
- Carsten Carlberg
- Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, ul. Juliana Tuwima 10, PL-10748, Olsztyn, Poland; School of Medicine, Institute of Biomedicine, University of Eastern Finland, FI-70211, Kuopio, Finland.
| |
Collapse
|
6
|
Lu DY, Lu TR. HIV/AIDS Curability Study, Different Approaches and Drug Combination. Infect Disord Drug Targets 2023:IDDT-EPUB-128884. [PMID: 36650650 DOI: 10.2174/1871526523666230117115826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 11/04/2022] [Accepted: 11/23/2022] [Indexed: 01/19/2023]
Abstract
AIM HIV infection is currently an incurable disease characterized by life-long drug utility. Its incurable causality and mechanism are still unknown to us. METHODS To overcome this therapeutic setback, some breakthroughs should be made by utilizing different approaches. How to plan some experimental and clinical novelty for HIV curability is a modern challenge. In this article, new ideas and approaches for global HIV/AIDS therapeutic strategies are proposed and represented by scientific insights. RESULTS Pharmaceutical characteristics, herbal medicine, novel drug targets, cutting-edge biotherapy, drug combination, animal modalities, and immune-stimuli for HIV latency, as well as clearance, are highlighted. DISCUSSION To elucidate our understanding of curative treatment for HIV/AIDS, many new pathological discoveries, expansion, technical advances, and potential drug targets are constructed. After the discovery of novel pathogenesis and therapeutic evolution, HIV/AIDS therapeutic curability may become achievable and a reality. CONCLUSION Transformation from animal model investigation to widespread therapies for larger volume of human population is a necessity in modern medicine. In this infectious treatment scenario, major breakthroughs in medicine and drug development are anticipated.
Collapse
Affiliation(s)
- Da-Yong Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, PRC
| | - Ting-Ren Lu
- College of Science, Shanghai University, Shanghai 200444, PRC
| |
Collapse
|
7
|
Payami H. The many genomes of Parkinson's disease. Int Rev Neurobiol 2022; 167:59-80. [PMID: 36427959 DOI: 10.1016/bs.irn.2022.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Genetic component of Parkinson's disease, once firmly believed non-existent, involves the human genome, mitochondrial genome, and the microbiome. Understanding the genomics of PD requires identification of PD-relevant genes and learning how they interact within the hologenome and with their environment. This chapter is an evidence-based perspective of a geneticist on how far we have come in this endeavor. The contemporary scientific society started with a naive and simplistic view of PD, evolved to accept that Parkinson's disease is probably the most complex disease there is, the progress we have made in discovering the genes and elucidating their functions, and now assembling the parts to create the whole.
Collapse
Affiliation(s)
- Haydeh Payami
- Professor of Genetics and Neurology, Department of Neurology, University of Alabama at Birmingham, Birmingham, AL, United States; Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, United States.
| |
Collapse
|
8
|
Vojvoda Zeljko T, Ugarković Đ, Pezer Ž. Differential enrichment of H3K9me3 at annotated satellite DNA repeats in human cell lines and during fetal development in mouse. Epigenetics Chromatin 2021; 14:47. [PMID: 34663449 PMCID: PMC8524813 DOI: 10.1186/s13072-021-00423-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 10/05/2021] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Trimethylation of histone H3 on lysine 9 (H3K9me3) at satellite DNA sequences has been primarily studied at (peri)centromeric regions, where its level shows differences associated with various processes such as development and malignant transformation. However, the dynamics of H3K9me3 at distal satellite DNA repeats has not been thoroughly investigated. RESULTS We exploit the sets of publicly available data derived from chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-Seq), produced by the The Encyclopedia of DNA Elements (ENCODE) project, to analyze H3K9me3 at assembled satellite DNA repeats in genomes of human cell lines and during mouse fetal development. We show that annotated satellite elements are generally enriched for H3K9me3, but its level in cancer cell lines is on average lower than in normal cell lines. We find 407 satellite DNA instances with differential H3K9me3 enrichment between cancer and normal cells including a large 115-kb cluster of GSATII elements on chromosome 12. Differentially enriched regions are not limited to satellite DNA instances, but instead encompass a wider region of flanking sequences. We found no correlation between the levels of H3K9me3 and noncoding RNA at corresponding satellite DNA loci. The analysis of data derived from multiple tissues identified 864 instances of satellite DNA sequences in the mouse reference genome that are differentially enriched between fetal developmental stages. CONCLUSIONS Our study reveals significant differences in H3K9me3 level at a subset of satellite repeats between biological states and as such contributes to understanding of the role of satellite DNA repeats in epigenetic regulation during development and carcinogenesis.
Collapse
Affiliation(s)
| | | | - Željka Pezer
- Ruđer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia.
| |
Collapse
|
9
|
Zhang Y, Jin X, Wang H, Miao Y, Yang X, Jiang W, Yin B. SARS-CoV-2 competes with host mRNAs for efficient translation by maintaining the mutations favorable for translation initiation. J Appl Genet 2021; 63:159-167. [PMID: 34655422 PMCID: PMC8520108 DOI: 10.1007/s13353-021-00665-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/24/2021] [Accepted: 10/03/2021] [Indexed: 11/05/2022]
Abstract
During SARS-CoV-2 proliferation, the translation of viral RNAs is usually the rate-limiting step. Understanding the molecular details of this step is beneficial for uncovering the origin and evolution of SARS-CoV-2 and even for controlling the pandemic. To date, it is unclear how SARS-CoV-2 competes with host mRNAs for ribosome binding and efficient translation. We retrieved the coding sequences of all human genes and SARS-CoV-2 genes. We systematically profiled the GC content and folding energy of each CDS. Considering that some fixed or polymorphic mutations exist in SARS-CoV-2 and human genomes, all algorithms and analyses were applied to both pre-mutate and post-mutate versions. In SARS-CoV-2 but not human, the 5-prime end of CDS had lower GC content and less RNA structure than the 3-prime part, which was favorable for ribosome binding and efficient translation initiation. Globally, the fixed and polymorphic mutations in SARS-CoV-2 had created an even lower GC content at the 5-prime end of CDS. In contrast, no similar patterns were observed for the fixed and polymorphic mutations in human genome. Compared with human RNAs, the SARS-CoV-2 RNAs have less RNA structure in the 5-prime end and thus are more favorable of fast translation initiation. The fixed and polymorphic mutations in SARS-CoV-2 are further amplifying this advantage. This might serve as a strategy for SARS-CoV-2 to adapt to the human host.
Collapse
Affiliation(s)
- Yanping Zhang
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Xiaojie Jin
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Haiyan Wang
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Yaoyao Miao
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Xiaoping Yang
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Wenqing Jiang
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China.,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China
| | - Bin Yin
- Department of Respiratory Diseases, Qingdao Haici Hospital, Qingdao, China. .,The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao, China.
| |
Collapse
|
10
|
Brucato N, André M, Tsang R, Saag L, Kariwiga J, Sesuki K, Beni T, Pomat W, Muke J, Meyer V, Boland A, Deleuze JF, Sudoyo H, Mondal M, Pagani L, Romero IG, Metspalu M, Cox MP, Leavesley M, Ricaut FX. Papua New Guinean genomes reveal the complex settlement of north Sahul. Mol Biol Evol 2021; 38:5107-5121. [PMID: 34383935 PMCID: PMC8557464 DOI: 10.1093/molbev/msab238] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The settlement of Sahul, the lost continent of Oceania, remains one of the most ancient and debated human migrations. Modern New Guineans inherited a unique genetic diversity tracing back 50,000 years, and yet there is currently no model reconstructing their past population dynamics. We generated 58 new whole genome sequences from Papua New Guinea, filling geographical gaps in previous sampling, specifically to address alternative scenarios of the initial migration to Sahul and the settlement of New Guinea. Here, we present the first genomic models for the settlement of northeast Sahul considering one or two migrations from Wallacea. Both models fit our dataset, reinforcing the idea that ancestral groups to New Guinean and Indigenous Australians split early, potentially during their migration in Wallacea where the northern route could have been favored. The earliest period of human presence in Sahul was an era of interactions and gene flow between related but already differentiated groups, from whom all modern New Guineans, Bismarck islanders and Indigenous Australians descend. The settlement of New Guinea was probably initiated from its southeast region, where the oldest archaeological sites have been found. This was followed by two migrations into the south and north lowlands that ultimately reached the west and east highlands. We also identify ancient gene flows between populations in New Guinea, Australia, East Indonesia and the Bismarck Archipelago, emphasizing the fact that the anthropological landscape during the early period of Sahul settlement was highly dynamic rather than the traditional view of extensive isolation.
Collapse
Affiliation(s)
- Nicolas Brucato
- Laboratoire Évolution and Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 Toulouse cedex 9, France
| | - Mathilde André
- Laboratoire Évolution and Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 Toulouse cedex 9, France.,Institute of Genomics, University of Tartu, Tartu, Tartumaa 51010, Estonia
| | - Roxanne Tsang
- School of Humanities, Languages and Social Science and Place, Evolution and Rock Art Heritage Unit, Griffith University Centre for Social and Cultural Research, Griffith University, Australia.,Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea
| | - Lauri Saag
- Institute of Genomics, University of Tartu, Tartu, Tartumaa 51010, Estonia
| | - Jason Kariwiga
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea.,School of Social Science, University of Queensland, Australia, St Lucia, QLD 4072, Australia
| | - Kylie Sesuki
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea
| | - Teppsy Beni
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea
| | - William Pomat
- Papua New Guinea Institute of Medical Research, Goroka, Papua New Guinea
| | - John Muke
- Social Research Institute, Papua New Guinea
| | - Vincent Meyer
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
| | - Herawati Sudoyo
- Genome Diversity and Diseases Laboratory, Eijkman Institute for Molecular Biology, Jakarta 10430, Indonesia
| | - Mayukh Mondal
- Institute of Genomics, University of Tartu, Tartu, Tartumaa 51010, Estonia
| | - Luca Pagani
- Institute of Genomics, University of Tartu, Tartu, Tartumaa 51010, Estonia.,Department of Biology, University of Padua, Italy
| | | | - Mait Metspalu
- Institute of Genomics, University of Tartu, Tartu, Tartumaa 51010, Estonia
| | - Murray P Cox
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
| | - Matthew Leavesley
- Strand of Anthropology, Sociology and Archaeology, School of Humanities and Social Sciences, University of Papua New Guinea, PO Box 320, University 134, National Capital District, Papua New Guinea.,College of Arts, Society and Education, James Cook University, P.O. Box 6811, Cairns, Queensland, 4870, Australia.,ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Wollongong, Wollongong, New south Wales, 2522, Australia
| | - François-Xavier Ricaut
- Laboratoire Évolution and Diversité Biologique (EDB UMR 5174), Université de Toulouse Midi-Pyrénées, CNRS, IRD, UPS. 118 route de Narbonne, Bat 4R1, 31062 Toulouse cedex 9, France
| |
Collapse
|
11
|
Stielow B, Simon C, Liefke R. Making fundamental scientific discoveries by combining information from literature, databases, and computational tools - An example. Comput Struct Biotechnol J 2021; 19:3027-3033. [PMID: 34136100 PMCID: PMC8175269 DOI: 10.1016/j.csbj.2021.04.052] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/18/2022] Open
Abstract
In recent years, the amount of available literature, data and computational tools has increased exponentially, providing opportunities and challenges to make use of this vast amount of material. Here, we describe how we utilized publicly available information to identify the previously hardly characterized protein SAMD1 (SAM domain-containing protein 1) as a novel unmethylated CpG island-binding protein. This discovery is an example, how the richness of material and tools on the internet can be used to make scientific breakthroughs, but also the hurdles that may occur. Specifically, we discuss how the misrepresentation of SAMD1 in literature and databases may have prevented an earlier characterization of this protein and we address what can be learned from this example.
Collapse
Affiliation(s)
- Bastian Stielow
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
| | - Clara Simon
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
| | - Robert Liefke
- Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany
- Department of Hematology, Oncology and Immunology, University Hospital Giessen and Marburg, 35043 Marburg, Germany
- Corresponding author at: Institute of Molecular Biology and Tumor Research (IMT), Philipps University of Marburg, 35043 Marburg, Germany.
| |
Collapse
|
12
|
Swazo NK. "Un-Promethean" science and the future of humanity: Heidegger's warning. Hist Philos Life Sci 2021; 43:33. [PMID: 33666741 DOI: 10.1007/s40656-021-00380-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 02/01/2021] [Indexed: 06/12/2023]
Abstract
The twentieth-century German philosopher Martin Heidegger distinguished "meditative" (besinnlich) and "calculative" (rechnende) modes of thinking as a way of highlighting the problematique of modern technology and the limits of modern science. In doing so he also was prescient to recognize, in 1955, that the most significant danger to the future of humanity are developments in molecular biology and biotechnology, in contrast to the post-World War global threat of thermonuclear weapons. These insights are engaged here in view of recent discussion of the need for international regulation of heritable human genome editing and the announcement in 2018 of the birth of the world's first gene-edited babies in China. Heidegger's call for meditative thinking requires modern medicine and the life sciences to appropriate the phenomenological conception of the human "way to be" (Seinsweise) such that it is not restricted to the "present-at-hand" (vorhanden) physiology and pathology of the human body (Körper).
Collapse
Affiliation(s)
- Norman K Swazo
- Department of History and Philosophy, North South University, Dhaka, Bangladesh.
| |
Collapse
|
13
|
Pös O, Radvanszky J, Buglyó G, Pös Z, Rusnakova D, Nagy B, Szemes T. Copy number variation: Characteristics, evolutionary and pathological aspects. Biomed J 2021:S2319-4170(21)00009-3. [PMID: 34649833 DOI: 10.1016/j.bj.2021.02.003] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 02/01/2021] [Accepted: 02/05/2021] [Indexed: 12/12/2022] Open
Abstract
Copy number variants (CNVs) were the subject of extensive research in the past years. They are common features of the human genome that play an important role in evolution, contribute to population diversity, development of certain diseases, and influence host–microbiome interactions. CNVs have found application in the molecular diagnosis of many diseases and in non-invasive prenatal care, but their full potential is only emerging. CNVs are expected to have a tremendous impact on screening, diagnosis, prognosis, and monitoring of several disorders, including cancer and cardiovascular disease. Here, we comprehensively review basic definitions of the term CNV, outline mechanisms and factors involved in CNV formation, and discuss their evolutionary and pathological aspects. We suggest a need for better defined distinguishing criteria and boundaries between known types of CNVs.
Collapse
|
14
|
Touati R, Tajouri A, Mesaoudi I, Oueslati AE, Lachiri Z, Kharrat M. New methodology for repetitive sequences identification in human X and Y chromosomes. Biomed Signal Process Control 2021; 64:102207. [PMID: 33101452 PMCID: PMC7572123 DOI: 10.1016/j.bspc.2020.102207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 07/23/2020] [Accepted: 09/01/2020] [Indexed: 11/24/2022]
Abstract
Repetitive DNA sequences occupy the major proportion of DNA in the human genome and even in the other species' genomes. The importance of each repetitive DNA type depends on many factors: structural and functional roles, positions, lengths and numbers of these repetitions are clear examples. Conserving such DNA sequences or not in different locations in the chromosome remains a challenge for researchers in biology. Detecting their location despite their great variability and finding novel repetitive sequences remains a challenging task. To side-step this problem, we developed a new method based on signal and image processing tools. In fact, using this method we could find repetitive patterns in DNA images regardless of the repetition length. This new technique seems to be more efficient in detecting new repetitive sequences than bioinformatics tools. In fact, the classical tools present limited performances especially in case of mutations (insertion or deletion). However, modifying one or a few numbers of pixels in the image doesn't affect the global form of the repetitive pattern. As a consequence, we generated a new repetitive patterns database which contains tandem and dispersed repeated sequences. The highly repetitive sequences, we have identified in X and Y chromosomes, are shown to be located in other human chromosomes or in other genomes. The data we have generated is then taken as input to a Convolutional neural network classifier in order to classify them. The system we have constructed is efficient and gives an average of 94.4% as recognition score.
Collapse
Affiliation(s)
- Rabeb Touati
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Asma Tajouri
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| | - Imen Mesaoudi
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Afef Elloumi Oueslati
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Zied Lachiri
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Maher Kharrat
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| |
Collapse
|
15
|
Komkov AY, Urazbakhtin SZ, Saliutina MV, Komech EA, Shelygin YA, Nugmanov GA, Shubin VP, Smirnova AO, Bobrov MY, Tsukanov AS, Snezhkina AV, Kudryavtseva AV, Lebedev YB, Mamedov IZ. SeqURE - a new copy-capture based method for sequencing of unknown Retroposition events. Mob DNA 2020; 11:33. [PMID: 33317630 PMCID: PMC7734759 DOI: 10.1186/s13100-020-00228-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Background Retroelements (REs) occupy a significant part of all eukaryotic genomes including humans. The majority of retroelements in the human genome are inactive and unable to retrotranspose. Dozens of active copies are repressed in most normal tissues by various cellular mechanisms. These copies can become active in normal germline and brain tissues or in cancer, leading to new retroposition events. The consequences of such events and their role in normal cell functioning and carcinogenesis are not yet fully understood. If new insertions occur in a small portion of cells they can be found only with the use of specific methods based on RE enrichment and high-throughput sequencing. The downside of the high sensitivity of such methods is the presence of various artifacts imitating real insertions, which in many cases cannot be validated due to lack of the initial template DNA. For this reason, adequate assessment of rare (< 1%) subclonal cancer specific RE insertions is complicated. Results Here we describe a new copy-capture technique which we implemented in a method called SeqURE for Sequencing Unknown of Retroposition Events that allows for efficient and reliable identification of new genomic RE insertions. The method is based on the capture of copies of target molecules (copy-capture), selective amplification and sequencing of genomic regions adjacent to active RE insertions from both sides. Importantly, the template genomic DNA remains intact and can be used for validation experiments. In addition, we applied a novel system for testing method sensitivity and precisely showed the ability of the developed method to reliably detect insertions present in 1 out of 100 cells and a substantial portion of insertions present in 1 out of 1000 cells. Using advantages of the method we showed the absence of somatic Alu insertions in colorectal cancer samples bearing tumor-specific L1HS insertions. Conclusions This study presents the first description and implementation of the copy-capture technique and provides the first methodological basis for the quantitative assessment of RE insertions present in a small portion of cells. Supplementary Information The online version contains supplementary material available at 10.1186/s13100-020-00228-6.
Collapse
Affiliation(s)
- Alexander Y Komkov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia. .,Dmitry Rogachev National Medical and Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia.
| | | | - Maria V Saliutina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Yuri A Shelygin
- Ryzhikh National Medical Research Centre for Coloproctology of the Ministry of Health of Russia, Moscow, Russia
| | - Gaiaz A Nugmanov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | - Vitaliy P Shubin
- Ryzhikh National Medical Research Centre for Coloproctology of the Ministry of Health of Russia, Moscow, Russia
| | | | - Mikhail Y Bobrov
- V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Moscow, Russia
| | - Alexey S Tsukanov
- Ryzhikh National Medical Research Centre for Coloproctology of the Ministry of Health of Russia, Moscow, Russia
| | - Anastasia V Snezhkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Anna V Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Yuri B Lebedev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | - Ilgar Z Mamedov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia. .,Dmitry Rogachev National Medical and Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia. .,V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Moscow, Russia. .,Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
| |
Collapse
|
16
|
Okawa Y, Kohara S, Uchiyama A, Yamazaki H, Uno Y. Evaluation of domain of unknown function 1220 (DUF1220) for detection of human genome by quantitative polymerase chain reaction: Potential use in assessing the biodistribution of transplanted therapeutic human cells. Drug Metab Pharmacokinet 2020; 38:100366. [PMID: 33714132 DOI: 10.1016/j.dmpk.2020.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 10/27/2020] [Accepted: 11/02/2020] [Indexed: 10/23/2022]
Abstract
The biodistribution profile of cell-based therapy products in animal models is important for evaluation of their safety and efficacy. Because of its quantitative nature and sensitivity, the quantitative polymerase chain reaction (qPCR) is a useful method for detecting and quantifying xenogeneic cell-derived DNA in animal models, thereby allowing a biodistribution profile to be established. Although the restriction endonuclease family from Arthrobacter luteus (Alu) of repetitive elements in human genome sequences has been used to assess the biodistribution of human cells, high background signals are detected. In the present study, we evaluate the potential of domain of unknown function 1220 (DUF1220), which is a human lineage-specific, multiple-copy gene similar to Alu sequences, for such analysis. Using qPCR analysis for DUF1220, human genome could be detected against a mouse genome background at a level comparable to that of Alu sequences with no detectable background signals. Moreover, using this approach, the human genome could be distinguished from the cynomolgus monkey genome. Further investigation of the quantitative aspects of this DUF1220-based qPCR assay might prove its usefulness for biodistribution studies of human cells transplanted into animals in the future.
Collapse
Affiliation(s)
- Yurie Okawa
- Drug Safety Research Laboratories, Shin Nippon Biomedical Laboratories, Ltd., Kagoshima, Japan.
| | - Sakae Kohara
- Pharmacokinetics and Bioanalysis Center, Shin Nippon Biomedical Laboratories, Ltd., Kainan, Japan
| | - Asako Uchiyama
- Drug Safety Research Laboratories, Shin Nippon Biomedical Laboratories, Ltd., Kagoshima, Japan
| | - Hiroshi Yamazaki
- Laboratory of Drug Metabolism and Pharmacokinetics, Showa Pharmaceutical University, Machida, Japan.
| | - Yasuhiro Uno
- Pharmacokinetics and Bioanalysis Center, Shin Nippon Biomedical Laboratories, Ltd., Kainan, Japan; Joint Faculty of Veterinary Medicine, Kagoshima University, Kagoshima, Japan.
| |
Collapse
|
17
|
Soifer L, Fong NL, Yi N, Ireland AT, Lam I, Sooknah M, Paw JS, Peluso P, Concepcion GT, Rank D, Hastie AR, Jojic V, Ruby JG, Botstein D, Roy MA. Fully Phased Sequence of a Diploid Human Genome Determined de Novo from the DNA of a Single Individual. G3 (Bethesda) 2020; 10:2911-2925. [PMID: 32631951 PMCID: PMC7466960 DOI: 10.1534/g3.119.400995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/26/2020] [Indexed: 12/17/2022]
Abstract
In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.
Collapse
Affiliation(s)
- Llya Soifer
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nicole L Fong
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nelda Yi
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | - Irene Lam
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | | | | | | - David Rank
- Pacific Biosciences, Menlo Park, CA 94025
| | | | | | - J Graham Ruby
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | |
Collapse
|
18
|
Shin DM, Hwang MY, Kim BJ, Ryu KH, Kim YJ. GEN2VCF: a converter for human genome imputation output format to VCF format. Genes Genomics 2020; 42:1163-8. [PMID: 32803703 DOI: 10.1007/s13258-020-00982-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 07/30/2020] [Indexed: 11/03/2022]
Abstract
BACKGROUND For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time. OBJECTIVE We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support. METHODS The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures. RESULTS GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively. CONCLUSIONS GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.
Collapse
|
19
|
Hatipoğlu Ö, Saydam F. Association between rs11362 polymorphism in the beta-defensin 1 (DEFB1) gene and dental caries: A meta-analysis. J Oral Biosci 2020; 62:272-279. [PMID: 32603779 DOI: 10.1016/j.job.2020.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 06/16/2020] [Accepted: 06/19/2020] [Indexed: 10/24/2022]
Abstract
OBJECTIVES Beta-defensin 1, encoded by the DEFB1 gene, is an important molecule that confers protection from dental caries. Numerous studies have been conducted on the rs11362 polymorphism in the DEFB1 gene. We evaluated the results from studies that have investigated the association between rs11362 polymorphism and dental caries, through a meta-analysis. METHODS This meta-analysis was designed according to the PRISMA statement guideline. Electronic databases (PubMed, Web of Science, Scopus, and Cochrane Library) were scanned by two independent researchers. The publication bias was determined by statistical analyses using funnel plot, Egger regression test, and Begg and Mazumdar rank correlation test. Heterogeneity was evaluated using the chi-square test, tau-square, and Higgins I2 test. Odds ratio (OR) was used to measure the effect size. RESULTS Rank correlation and regression procedures showed the absence of publication bias in the meta-analysis (p > 0.05). The DEFB1 rs11362 polymorphism in the heterozygous (CC vs. CT: OR = 2.20, 95% confidence interval (CI): 1.17, 4.10; p = 0.014) and dominant (CC vs. CT + TT: OR = 3.11, 95% CI: 1.18, 8.21; p = 0.022) models in the permanent dentition subgroup showed significant differences. However, there was no significant difference between any model in either the deciduous dentition (p > 0.05) or the mixed dentition subgroups (p > 0.05). CONCLUSIONS This meta-analysis suggests that the DEFB1 rs11362 polymorphism is associated with dental caries in permanent dentition. Moreover, individuals with the TT genotype were found to have seven times higher risk of dental caries than individuals with the CC genotype. There was no such association or statistical difference observed for deciduous and mixed dentitions.
Collapse
Affiliation(s)
- Ömer Hatipoğlu
- Department of Restorative Dentistry, Sutcu Imam University, Kahramanmaras, Turkey.
| | - Faruk Saydam
- Department of Medical Biology and Genetics, Recep Tayyip Erdogan University, Rize, Turkey.
| |
Collapse
|
20
|
Abstract
The ENCODE project has made important new estimates of human genome functionality, now revising the percentage considered functional to more than 80%, which is in stark contrast to the received view, which estimated that less than 10% of the conserved parts of the human genome are functional. ENCODE's unorthodox use of the notion of biological function has stirred the so-called ENCODE controversy, involving conflicting views about the correct notion of function in postgenomics. The debate hinges on the traditional philosophical contrast between the causal role (CR) and selected effects (SE) approaches. In this paper, we examine the ENCODE controversy in terms of the distinction between function monism and pluralism. We propose to apply a weak etiological account to genomic function ascriptions. In this approach, we can ascribe a function to a genomic structure of an organism if and only if performing the function persists in causally contributing to the organism's and its ancestors' fitness. In comparison to the strong etiological (i.e., the selected effects) approach, the present account does not require there to be selection for the structure in question. This is a monistic approach that enables us to avoid the main difficulties of CR, as well as SE's overdependence on natural selection, while still preserving an evolutionary-constrained notion of biological functions. Our proposal is much more moderate in accommodating the estimates of the functionality of the human genome than both ENCODE's proposal itself and the views of the critics relying on a version of the SE account of functions.
Collapse
Affiliation(s)
- Zdenka Brzović
- Department of Philosophy, Faculty of Humanities and Social Sciences, University of Rijeka, Sveučilišna avenija 4, 51000, Rijeka, Croatia.
| | - Predrag Šustar
- Department of Philosophy, Faculty of Humanities and Social Sciences, University of Rijeka, Sveučilišna avenija 4, 51000, Rijeka, Croatia.
| |
Collapse
|
21
|
Yao RA, Akinrinade O, Chaix M, Mital S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med Genomics 2020; 13:11. [PMID: 31996208 DOI: 10.1186/s12920-020-0664-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 01/20/2020] [Indexed: 01/03/2023] Open
Abstract
Background Whole-genome sequencing (WGS) is becoming an increasingly important tool for detecting genomic variation. Blood derived DNA is the current standard for WGS for research or clinical purposes but may not always be feasible to acquire. The usability of DNA from saliva for WGS is not known. We compared the quality of WGS between blood versus saliva derived DNA. Methods WGS was performed in DNA from 531 blood and 502 saliva samples (including 5 paired samples) from participants enrolled in a heart disease biorepository. We compared the proportion of sequencing reads that mapped to non-human sources (microbiome), the sequencing coverage, and the yield and concordance of single nucleotide variant (SNV) and copy number variant (CNV) calls between blood and saliva genomes. Results Of 531 blood and 502 saliva samples, 46% saliva DNA failed quality control (QC) requirements for WGS compared to 6% QC failure for blood DNA. An average of 10.7% WGS reads in the saliva samples mapped to the human oral microbiome compared to 0.09% WGS reads in blood samples. However, these reads were readily excluded by excluding reads that did not map to the human reference genome. Sequencing coverage met or exceeded the target sequencing depth of 30x in all the blood samples and 4 of the 5 saliva samples; the fifth saliva sample had an average sequencing depth of 22.6x. Over 95% of SNVs identified in saliva were concordant with those identified in blood across the genome, within all gene coding regions, and within cardiovascular disease-related gene coding regions. Rare SNVs, defined as those with a minor allele frequency of less than 1% in the Genome Aggregation Database, had a lower concordance of 90% between blood and saliva genomes. CNVs had only 76% concordance between blood and saliva samples. Conclusions High quality saliva samples that meet stringent QC criteria can be used for WGS when blood-derived DNA is not available or is not suitable. Saliva DNA provides an acceptable yield of SNV calls but has a lower yield for CNV calls compared to blood DNA.
Collapse
|
22
|
Li R, Tian X, Yang P, Fan Y, Li M, Zheng H, Wang X, Jiang Y. Recovery of non-reference sequences missing from the human reference genome. BMC Genomics 2019; 20:746. [PMID: 31619167 PMCID: PMC6796347 DOI: 10.1186/s12864-019-6107-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/20/2019] [Indexed: 01/12/2023] Open
Abstract
Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.
Collapse
Affiliation(s)
- Ran Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yingzhi Fan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Hongxiang Zheng
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
23
|
Abstract
Humans are infected with many viruses, and the immune system mostly removes viruses and the infected cells. However, certain viruses have entered the human genome. Of the human genome, ∼45% is composed of transposable elements (long interspersed nuclear elements [LINEs], short interspersed nuclear elements [SINEs] and transposons) and 5-8% is derived from viral sequences with similarity to infectious retroviruses. If integration of retrovirus occurs in a germline, the integrated viral sequences are heritable. Accumulation of viral sequences has created the current human genome. This article summarizes recent studies of retroviruses in humans and bridges clinical fields and evolutionary genetics. First, we report the repertories of human-infective retroviruses. Second, we review endogenous retroviruses in the human genome and diseases associated with endogenous retroviruses. Third, we discuss the biological functions of endogenous retroviruses and propose the concept of accelerated human evolution via viruses. Finally, we present perspectives of virology in the field of evolutionary medicine.
Collapse
Affiliation(s)
- Yukako Katsura
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan.
| | - Satoshi Asai
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan
| |
Collapse
|
24
|
Zwick M, Kraemer O, Carter AJ. Dataset of the frequency patterns of publications annotated to human protein-coding genes, their protein products and genetic relevance. Data Brief 2019; 25:104284. [PMID: 31453287 PMCID: PMC6702404 DOI: 10.1016/j.dib.2019.104284] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 07/12/2019] [Indexed: 01/15/2023] Open
Abstract
We present data concerning the distribution of scientific publications for human protein-coding genes together with their protein products and genetic relevance. We annotated the gene2pubmed dataset Maglott et al., 2007 provided by the NCBI (National Center for Biotechnology Information) with publication years, genetic metadata corresponding to Online Mendelian Inheritance in Man (OMIM) Hamosh et al., 2005 entries and the frequency of their appearance in Genome-Wide Association Studies (GWAS) Buniello et al., 2019 provided by the European Bioinformatics Institute (EBI) using the KNIME® Analytics Platform Berthold et al., 2008. The results of this data integration process comprise two datasets: 1) A dataset containing information on all human protein-coding genes that can be used to analyse the number of scientific publications in context of the potential disease relevance of the individual genes. 2) A table with the annual and cumulated number of PubMed entries. For further interpretation of the data presented in this article, please see the research article 'Target 2035 - probing the human proteome' by Carter et al. https://doi.org/10.1016/j.drudis.2019.06.020 Carter et al., 2019.
Collapse
Affiliation(s)
- Matthias Zwick
- Computational Biology, Boehringer Ingelheim, 88400 Biberach an der Riß, Germany
| | - Oliver Kraemer
- Discovery Research Coordination, Boehringer Ingelheim, 55216 Ingelheim Am Rhein, Germany
| | - Adrian J Carter
- Discovery Research Coordination, Boehringer Ingelheim, 55216 Ingelheim Am Rhein, Germany
| |
Collapse
|
25
|
Sivaprakasam B, Sadagopan P. Development of an Interactive Web Application "Shiny App for Frequency Analysis on Homo sapiens Genome (SAFA-HsG)". Interdiscip Sci 2019; 11:723-729. [PMID: 31264054 DOI: 10.1007/s12539-019-00340-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 06/08/2019] [Accepted: 06/19/2019] [Indexed: 10/26/2022]
Abstract
The web application "Shiny App for Frequency Analysis on Homo sapiens Genome (SAFA-HsG)" was developed using R programming-based bioconductor packages and shiny framework. Through the app, preliminary descriptive data analysis on nucleotide frequency, and CpG island, CpG non-island, and CpG island shores and shelves (downstream and upstream) of human reference genome can be carried out, which will help biologists to work on human epigenomics. Table view of these analyses of all chromosomes can be visualized and downloaded by the end users. Similarly, the respective comparative plots can be used for CpG sites comparison. In addition, to introduce the personal genome project, the present study has done a preliminary work on few raw data and are included in the app, which will create interest on personal genome information. The app is hosted on https://SAFA-HsG.shinyapps.io/home/. It is a multi-platform application and can be initiated locally from any computer that has or has not installed R. It is a user-friendly interface, which will allow a biologist, even who has little computer knowledge to access and analyze further.
Collapse
Affiliation(s)
- Balamurugan Sivaprakasam
- Department of Computer Science, Vels Institute of Science, Technology and Advanced Studies (VISTAS), Pallavaram, Chennai, 600 117, Tamil Nadu, India.
| | - Prasanna Sadagopan
- Department of Computer Science, Vels Institute of Science, Technology and Advanced Studies (VISTAS), Pallavaram, Chennai, 600 117, Tamil Nadu, India
| |
Collapse
|
26
|
Dahiya R, Naqvi AAT, Mohammad T, Alajmi MF, Rehman MT, Hussain A, Hassan MI. Investigating the structural features of chromodomain proteins in the human genome and predictive impacts of their mutations in cancers. Int J Biol Macromol 2019; 131:1101-16. [PMID: 30917913 DOI: 10.1016/j.ijbiomac.2019.03.162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 03/20/2019] [Accepted: 03/22/2019] [Indexed: 11/22/2022]
Abstract
Epigenetic readers are specific proteins which recognize histone marks and represents the underlying mechanism for chromatin regulation. Histone H3 lysine methylation is a potential epigenetic code for the chromatin organization and transcriptional control. Recognition of histone methylation is achieved by evolutionary conserved reader modules known as chromodomain, identified in several proteins, and is involved in transcriptional silencing and chromatin remodelling. Genetic perturbations within the structurally conserved chromodomain could potentially mistarget the reader protein and impair their regulatory pathways, ultimately leading to cellular chaos by setting the stage for tumor development and progression. Here, we report the structural conservations associated with diverse functions, prognostic significance and functional consequences of mutations within chromodomain of human proteins in distinct cancers. We have extensively analysed chromodomain containing human proteins in terms of their structural-functional ability to act as a molecular switch in the recognition of methyl-lysine recognition. We further investigated the combinatorial potential, target promiscuity and binding specificity associated with their underlying mechanisms. Indeed, the molecular mechanism of epigenetic silencing significantly underlies a newer cancer therapy approach. We hope that a critical understanding of chromodomains will pave the way for novel paths of research providing newer insights into the designing of effective anti-cancer therapies.
Collapse
|
27
|
Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res Notes 2019; 12:106. [PMID: 30813969 PMCID: PMC6391780 DOI: 10.1186/s13104-019-4137-z] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 02/15/2019] [Indexed: 01/08/2023] Open
Abstract
Objective Basic parameters commonly used to describe genomes including length, weight and relative guanine-cytosine (GC) content are widely cited in absence of a primary source. By using updated data and original software we determined these values to the best of our knowledge as standard reference for the whole human nuclear genome, for each chromosome and for mitochondrial DNA. We also devised a method to calculate the relative GC content in the whole messenger RNA sequence set and in transcriptomes by multiplying the GC content of each gene by its mean expression level. Results The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg. The individual variability and the implication for the DNA informational density in terms of bits/volume were discussed. The genomic GC content is 40.9%. Following analysis in different transcriptomes and species, we showed that the greatest deviation was observed in the pathological condition analysed (trisomy 21 leukaemic cells) and in Caenorhabditis elegans. Our results may represent a solid basis for further investigation on human structural and functional genomics while also providing a framework for other genome comparative analysis. Electronic supplementary material The online version of this article (10.1186/s13104-019-4137-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Maria Chiara Pelleri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Francesca Antonaros
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Pierluigi Strippoli
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Maria Caracausi
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy.
| | - Lorenza Vitale
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| |
Collapse
|
28
|
Ma ZS, Li L, Ye C, Peng M, Zhang YP. Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: Demonstrated with a human genome. Genomics 2018; 111:1896-1901. [PMID: 30594583 DOI: 10.1016/j.ygeno.2018.12.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 11/17/2018] [Accepted: 12/24/2018] [Indexed: 10/27/2022]
Abstract
The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1 Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10×-Genomics® technology, which integrates a novel bar-coding strategy with Illumina® NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10×-Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7× coverage of ultra-long Nanopore® reads, augmented with 10× reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35× coverage of Nanopore reads. Compared with the assembly with 10×-Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10×-Genomics reads offers a low-cost (less than ¼ the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usage = 61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.
Collapse
Affiliation(s)
- Zhanshan Sam Ma
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Lianwei Li
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China
| | - Chengxi Ye
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Minsheng Peng
- Molecular Evolution and Genome Diversity Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China; KIZ/CUHK Joint Laboratory of Bio-resources and Molecular Research in Common Diseases, Kunming 650223, China
| | - Ya-Ping Zhang
- Molecular Evolution and Genome Diversity Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China; KIZ/CUHK Joint Laboratory of Bio-resources and Molecular Research in Common Diseases, Kunming 650223, China.
| |
Collapse
|
29
|
Maranda V, Sunstrum FG, Drouin G. Both male and female gamete generating cells produce processed pseudogenes in the human genome. Gene 2018; 684:70-75. [PMID: 30359744 DOI: 10.1016/j.gene.2018.10.061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 09/24/2018] [Accepted: 10/21/2018] [Indexed: 11/25/2022]
Abstract
The human genome contains an unusually large number of processed pseudogenes. The fact that processed pseudogenes are roughly 33% more abundant in our X chromosome than in our autosomes suggests that this overabundance is the result of the fact that human oogenesis is much longer than that of non-mammalian species. Here, we analyze the origins of the processed pseudogenes found on the human Y chromosome to determine whether human spermatogenesis also contribute to this overabundance. Our results show that human processed pseudogenes not only retrotranspose to the Y chromosome, but are also produced by genes on the Y chromosome. Furthermore, the fact that X chromosomes are three times more abundant than Y chromosomes likely explains why the euchromatic density of processed pseudogenes is three times higher in the X chromosome than in the Y chromosome. The large number of processed pseudogenes found in our genome is therefore due to the low substrate specificity of the L1 reverse transcriptase responsible for the reverse transcription of germline mRNA molecules into processed pseudogenes, as well as the life-long production of both male and female gametes.
Collapse
Affiliation(s)
- Vincent Maranda
- Département de biologie et Centre de recherche avancée en génomique environnementale, Université d'Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | - Frédérick G Sunstrum
- Département de biologie et Centre de recherche avancée en génomique environnementale, Université d'Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | - Guy Drouin
- Département de biologie et Centre de recherche avancée en génomique environnementale, Université d'Ottawa, Ottawa, Ontario K1N 6N5, Canada.
| |
Collapse
|
30
|
Li W, Thanos D, Provata A. Quantifying local randomness in human DNA and RNA sequences using Erdös motifs. J Theor Biol 2018; 461:41-50. [PMID: 30336158 DOI: 10.1016/j.jtbi.2018.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/14/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]
Abstract
In 1932, Paul Erdös asked whether a random walk constructed from a binary sequence can achieve the lowest possible deviation (lowest discrepancy), for the sequence itself and for all its subsequences formed by homogeneous arithmetic progressions. Although avoiding low discrepancy is impossible for infinite sequences, as recently proven by Terence Tao, attempts were made to construct such sequences with finite lengths. We recognize that such constructed sequences (we call these "Erdös sequences") exhibit certain hallmarks of randomness at the local level: they show roughly equal frequencies of short subsequences, and at the same time exclude trivial periodic patterns. For the human DNA we examine the frequency of a set of Erdös motifs of length-10 using three nucleotides-to-binary mappings. The particular length-10 Erdös sequence is derived from the length-11 Mathias sequence and is identical with the first 10 digits of the Thue-Morse sequence, underscoring the fact that both are deficient in periodicities. Our calculations indicate that: (1) the purine(A and G)/pyridimine(C and T) based Erdös motifs are greatly underrepresented in the human genome, (2) the strong(G and C)/weak(A and T) based Erdös motifs are slightly overrepresented, (3) the densities of the two are negatively correlated, (4) the Erdös motifs based on all three mappings being combined are slightly underrepresented, and (5) the strong/weak based Erdös motifs are greatly overrepresented in the human messenger RNA sequences.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA.
| | - Dimitrios Thanos
- Department of Mathematics, National and Kapodistrian University of Athens, Athens GR-15784, Greece; Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| | - Astero Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", Athens GR-15341, Greece
| |
Collapse
|
31
|
Gutbier S, May P, Berthelot S, Krishna A, Trefzer T, Behbehani M, Efremova L, Delp J, Gstraunthaler G, Waldmann T, Leist M. Major changes of cell function and toxicant sensitivity in cultured cells undergoing mild, quasi-natural genetic drift. Arch Toxicol 2018; 92:3487-503. [PMID: 30298209 DOI: 10.1007/s00204-018-2326-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 06/19/2018] [Indexed: 12/11/2022]
Abstract
Genomic drift affects the functional properties of cell lines, and the reproducibility of data from in vitro studies. While chromosomal aberrations and mutations in single pivotal genes are well explored, little is known about effects of minor, possibly pleiotropic, genome changes. We addressed this question for the human dopaminergic neuronal precursor cell line LUHMES by comparing two subpopulations (SP) maintained either at the American-Type-Culture-Collection (ATCC) or by the original provider (UKN). Drastic differences in susceptibility towards the specific dopaminergic toxicant 1-methyl-4-phenylpyridinium (MPP+) were observed. Whole-genome sequencing was performed to identify underlying genetic differences. While both SP had normal chromosome structures, they displayed about 70 differences on the level of amino acid changing events. Some of these differences were confirmed biochemically, but none offered a direct explanation for the altered toxicant sensitivity pattern. As second approach, markers known to be relevant for the intended use of the cells were specifically tested. The “ATCC” cells rapidly down-regulated the dopamine-transporter and tyrosine-hydroxylase after differentiation, while “UKN” cells maintained functional levels. As the respective genes were not altered themselves, we conclude that polygenic complex upstream changes can have drastic effects on biochemical features and toxicological responses of relatively similar SP of cells.
Collapse
|
32
|
Ausmees K, John A, Toor SZ, Hellander A, Nettelblad C. BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data. BMC Bioinformatics 2018; 19:240. [PMID: 29940842 DOI: 10.1186/s12859-018-2241-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 06/12/2018] [Indexed: 11/10/2022] Open
Abstract
Background The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. Results We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds – an increasingly common scenario for both academic and commercial cloud providers – our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. Conclusions BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis. Electronic supplementary material The online version of this article (10.1186/s12859-018-2241-z) contains supplementary material, which is available to authorized users.
Collapse
|
33
|
Abstract
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.
Collapse
Affiliation(s)
- Yun Jia
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; College of Science, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | - Jingfeng Wang
- College of Science, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Hu Meng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Zhenhua Yang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
34
|
Tavares AH, Raymaekers J, Rousseeuw PJ, Silva RM, Bastos CAC, Pinho A, Brito P, Afreixo V. Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies. Interdiscip Sci 2018; 10:1-11. [PMID: 29214497 DOI: 10.1007/s12539-017-0273-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Revised: 10/04/2017] [Accepted: 11/08/2017] [Indexed: 06/07/2023]
Abstract
In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
Collapse
Affiliation(s)
- Ana Helena Tavares
- Department of Mathematics and CIDMA and iBiMED, University of Aveiro, Aveiro, Portugal.
| | | | | | - Raquel M Silva
- Department of Medical Sciences and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| | - Carlos A C Bastos
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Armando Pinho
- Department of Electronics Telecommunications and Informatics and IEETA, University of Aveiro, Aveiro, Portugal
| | - Paula Brito
- Faculty of Economics and LIAAD-INESC TEC, University of Porto, Porto, Portugal
| | - Vera Afreixo
- Department of Mathematics and CIDMA and iBiMED and IEETA, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
35
|
Flygare S, Hernandez EJ, Phan L, Moore B, Li M, Fejes A, Hu H, Eilbeck K, Huff C, Jorde L, G Reese M, Yandell M. The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool. BMC Bioinformatics 2018; 19:57. [PMID: 29463208 PMCID: PMC5819680 DOI: 10.1186/s12859-018-2056-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 02/13/2018] [Indexed: 11/24/2022] Open
Abstract
Background Prioritization of sequence variants for diagnosis and discovery of Mendelian diseases is challenging, especially in large collections of whole genome sequences (WGS). Fast, scalable solutions are needed for discovery research, for clinical applications, and for curation of massive public variant repositories such as dbSNP and gnomAD. In response, we have developed VVP, the VAAST Variant Prioritizer. VVP is ultrafast, scales to even the largest variant repositories and genome collections, and its outputs are designed to simplify clinical interpretation of variants of uncertain significance. Results We show that scoring the entire contents of dbSNP (> 155 million variants) requires only 95 min using a machine with 4 cpus and 16 GB of RAM, and that a 60X WGS can be processed in less than 5 min. We also demonstrate that VVP can score variants anywhere in the genome, regardless of type, effect, or location. It does so by integrating sequence conservation, the type of sequence change, allele frequencies, variant burden, and zygosity. Finally, we also show that VVP scores are consistently accurate, and easily interpreted, traits not shared by many commonly used tools such as SIFT and CADD. Conclusions VVP provides rapid and scalable means to prioritize any sequence variant, anywhere in the genome, and its scores are designed to facilitate variant interpretation using ACMG and NHS guidelines. These traits make it well suited for operation on very large collections of WGS sequences. Electronic supplementary material The online version of this article (10.1186/s12859-018-2056-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Steven Flygare
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,Present address: IDbyDNA Inc., San Francisco, CA, USA
| | - Edgar Javier Hernandez
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA
| | - Lon Phan
- National Center for Biotechnology Information, Bethesda, MD, USA
| | - Barry Moore
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA
| | - Man Li
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Hao Hu
- Department of Epidemiology, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Karen Eilbeck
- USTAR Center for Genetic Discovery, Salt Lake City, UT, USA.,Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Chad Huff
- Department of Epidemiology, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Lynn Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA
| | | | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA. .,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA.
| |
Collapse
|
36
|
Abstract
BACKGROUND Chromosomal deletions represent an important class of human genetic variation. Various methods have been developed to mine "next-generation" sequencing (NGS) data to detect deletions and quantify their clonal abundances. These methods have focused almost exclusively on the nuclear genome, ignoring the mitochondrial chromosome (mtDNA). Detecting mtDNA deletions requires special care. First, the chromosome's relatively small size (16,569 bp) necessitates the ability to detect extremely focal events. Second, the chromosome can be present at thousands of copies in a single cell (in contrast to two copies of nuclear chromosomes), and mtDNA deletions may be present on only a very small percentage of chromosomes. Here we present a method, termed MitoDel, to detect mtDNA deletions from NGS data. RESULTS We validate the method on simulated and real data, and show that MitoDel can detect novel and previously-reported mtDNA deletions. We establish that MitoDel can find deletions such as the "common deletion" at heteroplasmy levels well below 1%. CONCLUSIONS MitoDel is a tool for detecting large mitochondrial deletions at low heteroplasmy levels. The tool can be downloaded at http://mendel.gene.cwru.edu/laframboiselab/ .
Collapse
Affiliation(s)
- Colleen M. Bosworth
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Sneha Grandhi
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Meetha P. Gould
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Thomas LaFramboise
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| |
Collapse
|
37
|
Paik ES, Choi HJ, Kim TJ, Lee JW, Kim BG, Bae DS, Choi CH. Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer. Cancer Res Treat 2017; 50:461-473. [PMID: 28546526 PMCID: PMC5912145 DOI: 10.4143/crt.2017.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 05/09/2017] [Indexed: 01/02/2023] Open
Abstract
Purpose We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. Materials and Methods We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. Results We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients’ survival. Conclusion Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
Collapse
Affiliation(s)
- E Sun Paik
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Hyun Jin Choi
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Tae-Joong Kim
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jeong-Won Lee
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Byoung-Gie Kim
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Duk-Soo Bae
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Chel Hun Choi
- Department of Obstetrics and Gynecology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
38
|
Boulos RE, Tremblay N, Arneodo A, Borgnat P, Audit B. Multi-scale structural community organisation of the human genome. BMC Bioinformatics 2017; 18:209. [PMID: 28399820 PMCID: PMC5387268 DOI: 10.1186/s12859-017-1616-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 03/28/2017] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. RESULTS We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. CONCLUSIONS Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.
Collapse
Affiliation(s)
- Rasha E Boulos
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342, Lyon, France.,Present address: Montpellier Cancer Institute (ICM), Montpellier Cancer Research Institute (IRCM) Inserm U1194, University of Montpellier, Montpellier, France
| | - Nicolas Tremblay
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342, Lyon, France.,Present address: CNRS, GIPSA-lab, Grenoble, France
| | - Alain Arneodo
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342, Lyon, France.,Present address: LOMA, Université de Bordeaux, CNRS, UMR 5798, 51 Cours de le Libération, Talence, 33405, France
| | - Pierre Borgnat
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342, Lyon, France
| | - Benjamin Audit
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, F-69342, Lyon, France.
| |
Collapse
|
39
|
Christiansen L, Amini S, Zhang F, Ronaghi M, Gunderson KL, Steemers FJ. Contiguity-Preserving Transposition Sequencing (CPT-Seq) for Genome-Wide Haplotyping, Assembly, and Single-Cell ATAC-Seq. Methods Mol Biol 2017; 1551:207-221. [PMID: 28138849 DOI: 10.1007/978-1-4939-6750-6_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Most genomes to date have been sequenced without taking into account the diploid nature of the genome. However, the distribution of variants on each individual chromosome can (1) significantly impact gene regulation and protein function, (2) have important implications for analyses of population history and medical genetics, and (3) be of great value for accurate interpretation of medically relevant genetic variation. Here, we describe a comprehensive and detailed protocol for an ultra fast (<3 h library preparation), cost-effective, and scalable haplotyping method, named Contiguity Preserving Transposition sequencing or CPT-seq (Amini et al., Nat Genet 46(12):1343-1349, 2014). CPT-seq accurately phases >95 % of the whole human genome in Mb-scale phasing blocks. Additionally, the same workflow can be used to aid de novo assembly (Adey et al., Genome Res 24(12):2041-2049, 2014), detect structural variants, and perform single cell ATAC-seq analysis (Cusanovich et al., Science 348(6237):910-914, 2015).
Collapse
Affiliation(s)
- Lena Christiansen
- Advanced Research Group, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | | | - Fan Zhang
- Advanced Research Group, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Mostafa Ronaghi
- Advanced Research Group, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Kevin L Gunderson
- Advanced Research Group, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Frank J Steemers
- Advanced Research Group, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA.
| |
Collapse
|
40
|
Pope BJ, Mahmood K, Jung CH, Georgeson P, Park DJ. Single nucleotide-level mapping of DNA double-strand breaks in human HEK293T cells. Genom Data 2016; 11:43-45. [PMID: 27942458 PMCID: PMC5133665 DOI: 10.1016/j.gdata.2016.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 11/07/2016] [Accepted: 11/09/2016] [Indexed: 11/29/2022]
Abstract
Constitutional biological processes involve the generation of DNA double-strand breaks (DSBs). The production of such breaks and their subsequent resolution are also highly relevant to neurodegenerative diseases and cancer, in which extensive DNA fragmentation has been described Stephens et al. (2011), Blondet et al. (2001). Tchurikov et al. Tchurikov et al. (2011, 2013) have reported previously that frequent sites of DSBs occur in chromosomal domains involved in the co-ordinated expression of genes. This group report that hot spots of DSBs in human HEK293T cells often coincide with H3K4me3 marks, associated with active transcription Kravatsky et al. (2015) and that frequent sites of DNA double-strand breakage are likely to be relevant to cancer genomics Tchurikov et al. (2013, 2016) . Recently, they applied a RAFT (rapid amplification of forum termini) protocol that selects for blunt-ended DSB sites and mapped these to the human genome within defined co-ordinate ‘windows’. In this paper, we re-analyse public RAFT data to derive sites of DSBs at the single-nucleotide level across the built genome for human HEK293T cells (https://figshare.com/s/35220b2b79eaaaf64ed8). This refined mapping, combined with accessory ENCODE data tracks and ribosomal DNA-related sequence annotations, will likely be of value for the design of clinically relevant targeted assays such as those for cancer susceptibility, diagnosis, treatment-matching and prognostication.
Collapse
Affiliation(s)
- Bernard J Pope
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Khalid Mahmood
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Chol-Hee Jung
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Peter Georgeson
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia
| | - Daniel J Park
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Australia; Genomic Technologies Group, Genetic Epidemiology Laboratory, Department of Pathology, The University of Melbourne, Australia
| |
Collapse
|
41
|
Abstract
Pediatrics is a dynamic discipline and there is awareness and hope for actualizing outstanding achievements in the field of child health in 21st century and beyond. Improved lifestyle and quality of children's health is likely to reduce the burden of adult diseases and enhance longevity because seeds of most adult diseases are sown in childhood. Identification and decoding of human genome is expected to revolutionize the practice of pediatrics. The day is not far off when a patient will walk into doctor's chamber with an electronic or digital health history on a CD or palmtop and a decoded genomic constitution. There will be reduced burden of genetic diseases because of selective abortions of "defective" fetuses and replacement of "bad" genes with "good" ones by genetic engineering. Availability of totipotent stem cells and developments in transplant technology are likely to revolutionize the management of a variety of hematologic cancers and life-threatening genetic disorders. The possibility of producing flawless designer babies by advances in assisted reproductive technologies (ARTs) is likely to be mired by several ethical and legal issues.The availability of newer vaccines by recombinant technology for emerging infective and for non-infective lifestyle diseases is likely to improve survival and quality of life. There is going to be a greater focus on the "patient" having the disease rather than "disease" per se by practicing holistic pediatrics by effective utilization of alternative or complementary strategies for health care. Due to advances in technology, pediatrics may get further dehumanized. A true healer cannot simply rely on technology; there must be a spiritual bond between the patient and the physician by exploiting the concept of psycho-neuro-immunology and body-mind interactions. In the years to come, physicians are likely to play "god" but medicine can't achieve immortality because anything born must die in accordance with nature's recycling blueprint. The medical science is likely to improve longevity but our goal should be to improve the quality of life.
Collapse
Affiliation(s)
- Meharban Singh
- Child Care Center, 625, Sector 37, Arun Vihar, Noida, UP, India.
| |
Collapse
|
42
|
Abstract
BACKGROUND Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide. RESULTS We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level. CONCLUSION For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.
Collapse
Affiliation(s)
- Ali R. Vahdati
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- The Swiss Institute of Bioinformatics, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, USA
| |
Collapse
|
43
|
Abstract
BACKGROUND Recently, a physical model of nucleosome formation based on sequence-dependent bending properties of the DNA double-helix has been used to reveal some enrichment of nucleosome-inhibiting energy barriers (NIEBs) nearby ubiquitous human "master" replication origins. Here we use this model to predict the existence of about 1.6 millions NIEBs over the 22 human autosomes. RESULTS We show that these high energy barriers of mean size 153 bp correspond to nucleosome-depleted regions (NDRs) in vitro, as expected, but also in vivo. On either side of these NIEBs, we observe, in vivo and in vitro, a similar compacted nucleosome ordering, suggesting an absence of chromatin remodeling. This nucleosomal ordering strongly correlates with oscillations of the GC content as well as with the interspecies and intraspecies mutation profiles along these regions. Comparison of these divergence rates reveals the existence of both positive and negative selections linked to nucleosome positioning around these intrinsic NDRs. Overall, these NIEBs and neighboring nucleosomes cover 37.5 % of the human genome where nucleosome occupancy is stably encoded in the DNA sequence. These 1 kb-sized regions of intrinsic nucleosome positioning are equally found in GC-rich and GC-poor isochores, in early and late replicating regions, in intergenic and genic regions but not at gene promoters. CONCLUSION The source of selection pressure on the NIEBs has yet to be resolved in future work. One possible scenario is that these widely distributed chromatin patterns have been selected in human to impair the condensation of the nucleosomal array into the 30 nm chromatin fiber, so as to facilitate the epigenetic regulation of nuclear functions in a cell-type-specific manner.
Collapse
Affiliation(s)
- Guénola Drillon
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, Lyon, F-69342 France
| | - Benjamin Audit
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, Lyon, F-69342 France
| | - Françoise Argoul
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, Lyon, F-69342 France
- LOMA, Université de Bordeaux, CNRS, UMR 5798, 51 Cours de le Libération, Talence, F-33405 France
| | - Alain Arneodo
- Univ Lyon, Ens de Lyon, Univ Claude Bernard Lyon 1, CNRS, Laboratoire de Physique, Lyon, F-69342 France
- LOMA, Université de Bordeaux, CNRS, UMR 5798, 51 Cours de le Libération, Talence, F-33405 France
| |
Collapse
|
44
|
Abstract
Motivated by a non-random but clustered distribution of SNPs, we introduce a phenomenological model to account for the clustering properties of SNPs in the human genome. The phenomenological model is based on a preferential mutation to the closer proximity of existing SNPs. With the Hapmap SNP data, we empirically demonstrate that the preferential model is better for illustrating the clustered distribution of SNPs than the random model. Moreover, the model is applicable not only to autosomes but also to the X chromosome, although the X chromosome has different characteristics from autosomes. The analysis of the estimated parameters in the model can explain the pronounced population structure and the low genetic diversity of the X chromosome. In addition, correlation between the parameters reveals the population-wise difference of the mutation probability. These results support the mutational non-independence hypothesis against random mutation.
Collapse
Affiliation(s)
- Chang-Yong Lee
- The Department of Industrial and Systems Engineering, Kongju National University, Cheonan 330-717, South Korea.
| |
Collapse
|
45
|
Trotta E. Selective forces and mutational biases drive stop codon usage in the human genome: a comparison with sense codon usage. BMC Genomics 2016; 17:366. [PMID: 27188984 PMCID: PMC4869280 DOI: 10.1186/s12864-016-2692-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 05/05/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three stop codons UAA, UAG, and UGA signal the termination of mRNA translation. As a result of a mechanism that is not adequately understood, they are normally used with unequal frequencies. RESULTS In this work, we showed that selective forces and mutational biases drive stop codon usage in the human genome. We found that, in respect to sense codons, stop codon usage was affected by stronger selective forces but was less influenced by neutral mutational biases. UGA is the most frequent termination codon in human genome. However, UAA was the preferred stop codon in genes with high breadth of expression, high level of expression, AT-rich coding sequences, housekeeping functions, and in gene ontology categories with the largest deviation from expected stop codon usage. Selective forces associated with the breadth and the level of expression favoured AT-rich sequences in the mRNA region including the stop site and its proximal 3'-UTR, but acted with scarce effects on sense codons, generating two regions, upstream and downstream of the stop codon, with strongly different base composition. By favouring low levels of GC-content, selection promoted labile local secondary structures at the stop site and its proximal 3'-UTR. The compositional and structural context favoured by selection was surprisingly emphasized in the class of ribosomal proteins and was consistent with sequence elements that increase the efficiency of translational termination. Stop codons were also heterogeneously distributed among chromosomes by a mechanism that was strongly correlated with the GC-content of coding sequences. CONCLUSIONS In human genome, the nucleotide composition and the thermodynamic stability of stop codon site and its proximal 3'-UTR are correlated with the GC-content of coding sequences and with the breadth and the level of gene expression. In highly expressed genes stop codon usage is compositionally and structurally consistent with highly efficient translation termination signals.
Collapse
Affiliation(s)
- Edoardo Trotta
- Institute of Translational Pharmacology, Consiglio Nazionale delle Ricerche (CNR), Rome, 00133, Italy.
| |
Collapse
|
46
|
Abstract
Editing human germline genes may act as boon in some genetic and other disorders. Recent editing of the genome of the human embryo with the CRISPR/Cas9 editing tool generated a debate amongst top scientists of the world for the ethical considerations regarding its effect on the future generations. It needs to be seen as to what transformation human gene editing brings to humankind in the times to come.
Collapse
Affiliation(s)
- Kewal Krishan
- Department of Anthropology, Panjab University, Sector-14, Chandigarh, India.
| | - Tanuj Kanchan
- Department of Forensic Medicine, Kasturba Medical College (Manipal University), Mangalore, India
| | - Bahadur Singh
- Department of Anthropology, Panjab University, Sector-14, Chandigarh, India
| |
Collapse
|
47
|
Pang E, Wu X, Lin K. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences. Mol Genet Genomics 2016; 291:1127-36. [PMID: 26833483 PMCID: PMC4875946 DOI: 10.1007/s00438-016-1170-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/18/2016] [Indexed: 11/30/2022]
Abstract
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Collapse
Affiliation(s)
- Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Xiaomei Wu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 310036, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
48
|
Medina-Rivas MA, Norris ET, Rishishwar L, Conley AB, Medrano-Trochez C, Valderrama-Aguirre A, Vannberg FO, Mariño-Ramírez L, Jordan IK. Chocó, Colombia: a hotspot of human biodiversity. ACTA ACUST UNITED AC 2016; 6:45-54. [PMID: 27668076 DOI: 10.18636/bioneotropical.v6i1.341] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
OBJECTIVE Chocó is a state located on the Pacific coast of Colombia that has a majority Afro-Colombian population. The objective of this study was to characterize the genetic ancestry, admixture and diversity of the population of Chocó, Colombia. METHODOLOGY Genetic variation was characterized for a sample of 101 donors (61 female and 40 male) from the state of Chocó. Genotypes were determined for each individual via the characterization of 610,545 single nucleotide polymorphisms genome-wide. Haplotypes for the uniparental mitochondrial DNA (female) and Y-DNA (male) chromosomes were also determined. These data were used for comparative analyses with a number of worldwide populations, including putative ancestral populations from Africa, the Americas and Europe, along with several admixed American populations. RESULTS The population of Chocó has predominantly African genetic ancestry (75.8%) with approximately equal parts European (13.4%) and Native American (11.1%) ancestry. Chocó shows relatively high levels of three-way genetic admixture, and far higher levels of Native American ancestry, compared to other New World African populations from the Caribbean and the United States. There is a striking pattern of sex-specific ancestry in Chocó, with Native American admixture along the female lineage and European admixture along the male lineage. The population of Chocó is also characterized by relatively high levels of overall genetic diversity compared to both putative ancestral populations and other admixed American populations. CONCLUSION These results suggest a unique genetic heritage for the population of Chocó and underscore the profound human genetic diversity that can be found in the region.
Collapse
Affiliation(s)
- Miguel A Medina-Rivas
- Centro de Investigación en Biotecnología y Recursos Fitogenéticos. Centro de Investigaciones en Biodiversidad y Hábitat, Universidad Tecnológica del Chocó, Quibdó, Chocó, Colombia; PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, Colombia
| | - Emily T Norris
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA
| | - Lavanya Rishishwar
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA; School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Andrew B Conley
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA
| | | | - Augusto Valderrama-Aguirre
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, Colombia; Biomedical Research Institute, Universidad Libre, Cali, Valle del Cauca, Colombia
| | - Fredrik O Vannberg
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Leonardo Mariño-Ramírez
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, Colombia; National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA; BIOS Centro de Bioinformática y Biología Computacional, Manizales, Caldas, Colombia
| | - I King Jordan
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, Colombia; School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA; BIOS Centro de Bioinformática y Biología Computacional, Manizales, Caldas, Colombia
| |
Collapse
|
49
|
Kupriyanova NS, Netchvolodov KK, Sadova AA, Cherepanova MD, Ryskov AP. Non-canonical ribosomal DNA segments in the human genome, and nucleoli functioning. Gene 2015; 572:237-42. [PMID: 26164756 DOI: 10.1016/j.gene.2015.07.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2015] [Revised: 06/16/2015] [Accepted: 07/07/2015] [Indexed: 10/23/2022]
Abstract
Ribosomal DNA (rDNA) in the human genome is represented by tandem repeats of 43 kb nucleotide sequences that form nucleoli organizers (NORs) on each of five pairs of acrocentric chromosomes. RDNA-similar segments of different lengths are also present on (NOR)(-) chromosomes. Many of these segments contain nucleotide substitutions, supplementary microsatellite clusters, and extended deletions. Recently, it was shown that, in addition to ribosome biogenesis, nucleoli exhibit additional functions, such as cell-cycle regulation and response to stresses. In particular, several stress-inducible loci located in the ribosomal intergenic spacer (rIGS) produce stimuli-specific noncoding nucleolus RNAs. By mapping the 5'/3' ends of the rIGS segments scattered throughout (NOR)(-) chromosomes, we discovered that the bonds in the rIGS that were most often susceptible to disruption in the rIGS were adjacent to, or overlapped with stimuli-specific inducible loci. This suggests the interconnection of the two phenomena - nucleoli functioning and the scattering of rDNA-like sequences on (NOR)(-) chromosomes.
Collapse
Affiliation(s)
| | | | - Anastasia A Sadova
- The Institute of Gene Biology, RAS, 34/5, Vavilov St., Moscow, Russian Federation.
| | - Marina D Cherepanova
- The Institute of Gene Biology, RAS, 34/5, Vavilov St., Moscow, Russian Federation.
| | - Alexei P Ryskov
- The Institute of Gene Biology, RAS, 34/5, Vavilov St., Moscow, Russian Federation.
| |
Collapse
|
50
|
Shih J, Hodge R, Andrade-Navarro MA. Comparison of inter- and intraspecies variation in humans and fruit flies. Genom Data 2015; 3:49-54. [PMID: 26484147 DOI: 10.1016/j.gdata.2014.11.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 11/12/2014] [Accepted: 11/12/2014] [Indexed: 12/17/2022]
Abstract
Variation is essential to species survival and adaptation during evolution. This variation is conferred by the imperfection of biochemical processes, such as mutations and alterations in DNA sequences, and can also be seen within genomes through processes such as the generation of antibodies. Recent sequencing projects have produced multiple versions of the genomes of humans and fruit flies (Drosophila melanogaster). These give us a chance to study how individual gene sequences vary within and between species. Here we arranged human and fly genes in orthologous pairs and compared such within-species variability with their degree of conservation between flies and humans. We observed that a significant number of proteins associated with mRNA translation are highly conserved between species and yet are highly variable within each species. The fact that we observe this in two species whose lineages separated more than 700 million years ago suggests that this is the result of a very ancient process. We hypothesize that this effect might be attributed to a positive selection for variability of virus-interacting proteins that confers a general resistance to viral hijacking of the mRNA translation machinery within populations. Our analysis points to this and to other processes resulting in positive selection for gene variation.
Collapse
|