1
|
Chittoor SS, Giunta S. Comparative analysis of predicted DNA secondary structures infers complex human centromere topology. Am J Hum Genet 2024; 111:2707-2719. [PMID: 39561771 PMCID: PMC11639080 DOI: 10.1016/j.ajhg.2024.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 10/24/2024] [Accepted: 10/25/2024] [Indexed: 11/21/2024] Open
Abstract
Secondary structures are non-canonical arrangements of nucleic acids due to intra-strand interactions, including base pairing, stacking, or other higher-order features that deviate from the standard double-helical conformation. While these structures are extensively studied in RNA, they can also form when DNA becomes single stranded, creating topological roadblocks that can impact essential DNA-based processes such as replication, transcription, and repair, ultimately affecting genome stability. The availability of a complete linear sequence of human genomes, including repetitive loci, enables the prediction of DNA secondary structures comparing across various regions. Here, we evaluate the intrinsic properties of linear single-stranded DNA sequences derived from sampling specialized human loci such as centromeres, pericentromeres, ribosomal DNA (rDNA), and coding regions from the CHM13 genome. Our comparative analysis of predicted secondary structures across human chromosomes revealed the heightened presence, complexity, and instability of secondary structures within the centromere, which gradually decreased toward the pericentromere onto chromosomes' arms, on average lowest in coding regions. Notably, centromeric repeats exhibited the highest level of topological complexity within both the active and divergent domains, even when compared to other repetitive tandem satellites, such as rDNA in acrocentric chromosomes. Our findings provide evidence of the intrinsic self-hybridizing properties of centromere repeats, which are capable of generating complex topological structures that may functionally correlate with chromosome missegregation, especially when centromeric chromatin is disrupted. Processes such as long non-coding RNA transcription, recombination, and other mechanisms that dechromatinize and unwind stretches of linear DNA in these regions create in vivo opportunities for the DNA acrobatics hereby predicted.
Collapse
|
2
|
Zee TW, Abdul Aziz MFB, Wei PC. Ethical challenges of conducting and reviewing human genomics research in Malaysia: An exploratory study. Dev World Bioeth 2024; 24:331-341. [PMID: 37997006 PMCID: PMC11111594 DOI: 10.1111/dewb.12435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 11/25/2023]
Abstract
Even though there is a significant amount of scholarly work examining the ethical issues surrounding human genomics research, little is known about its footing in Malaysia. This study aims to explore the experience of local researchers and research ethics committee (REC) members in developing it in Malaysia. In-depth interviews were conducted from April to May 2021, and the data were thematically analysed. In advancing this technology, both genomics researchers and REC members have concerns over how this research is being developed in the country especially the absence of a clear ethical and regulatory framework at the national level as a guidance. However, this study argues that it is not a salient issue as there are international guidelines in existence and both researchers and RECs will benefit from a training on the guidelines to ensure genomics research can be developed in an ethical manner.
Collapse
|
3
|
Hou T, Shen X, Zhang S, Liang M, Chen L, Lu Q. AIGen: an artificial intelligence software for complex genetic data analysis. Brief Bioinform 2024; 25:bbae566. [PMID: 39550221 PMCID: PMC11568876 DOI: 10.1093/bib/bbae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 09/12/2024] [Accepted: 11/11/2024] [Indexed: 11/18/2024] Open
Abstract
The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has rarely been used in genetic data analysis due to analytical and computational challenges brought by high-dimensional genetic data and an increasing number of samples. To facilitate the use of AI in genetic data analysis, we developed a C++ package, AIGen, based on two newly developed neural networks (i.e. kernel neural networks and functional neural networks) that are capable of modeling complex genotype-phenotype relationships (e.g. interactions) while providing robust performance against high-dimensional genetic data. Moreover, computationally efficient algorithms (e.g. a minimum norm quadratic unbiased estimation approach and batch training) are implemented in the package to accelerate the computation, making them computationally efficient for analyzing large-scale datasets with thousands or even millions of samples. By applying AIGen to the UK Biobank dataset, we demonstrate that it can efficiently analyze large-scale genetic data, attain improved accuracy, and maintain robust performance. Availability: AIGen is developed in C++ and its source code, along with reference libraries, is publicly accessible on GitHub at https://github.com/TingtHou/AIGen.
Collapse
|
4
|
Walter NG. Are non-protein coding RNAs junk or treasure?: An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non-functional or functional RNAs. Bioessays 2024; 46:e2300201. [PMID: 38351661 DOI: 10.1002/bies.202300201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/18/2024] [Accepted: 01/19/2024] [Indexed: 03/28/2024]
Abstract
The human genome project's lasting legacies are the emerging insights into human physiology and disease, and the ascendance of biology as the dominant science of the 21st century. Sequencing revealed that >90% of the human genome is not coding for proteins, as originally thought, but rather is overwhelmingly transcribed into non-protein coding, or non-coding, RNAs (ncRNAs). This discovery initially led to the hypothesis that most genomic DNA is "junk", a term still championed by some geneticists and evolutionary biologists. In contrast, molecular biologists and biochemists studying the vast number of transcripts produced from most of this genome "junk" often surmise that these ncRNAs have biological significance. What gives? This essay contrasts the two opposing, extant viewpoints, aiming to explain their bases, which arise from distinct reference frames of the underlying scientific disciplines. Finally, it aims to reconcile these divergent mindsets in hopes of stimulating synergy between scientific fields.
Collapse
|
5
|
Ferreira RC, Rodrigues CR, Broach JR, Briones MRS. Convergent Mutations and Single Nucleotide Variants in Mitochondrial Genomes of Modern Humans and Neanderthals. Int J Mol Sci 2024; 25:3785. [PMID: 38612593 PMCID: PMC11012180 DOI: 10.3390/ijms25073785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/13/2024] [Indexed: 04/14/2024] Open
Abstract
The genetic contributions of Neanderthals to the modern human genome have been evidenced by the comparison of present-day human genomes with paleogenomes. Neanderthal signatures in extant human genomes are attributed to intercrosses between Neanderthals and archaic anatomically modern humans (AMHs). Although Neanderthal signatures are well documented in the nuclear genome, it has been proposed that there is no contribution of Neanderthal mitochondrial DNA to contemporary human genomes. Here we show that modern human mitochondrial genomes contain 66 potential Neanderthal signatures, or Neanderthal single nucleotide variants (N-SNVs), of which 36 lie in coding regions and 7 result in nonsynonymous changes. Seven N-SNVs are associated with traits such as cycling vomiting syndrome, Alzheimer's disease and Parkinson's disease, and two N-SNVs are associated with intelligence quotient. Based on recombination tests, principal component analysis (PCA) and the complete absence of these N-SNVs in 41 archaic AMH mitogenomes, we conclude that convergent evolution, and not recombination, explains the presence of N-SNVs in present-day human mitogenomes.
Collapse
|
6
|
Yang C, Zhang Z, Huang Y, Xie X, Liao H, Xiao J, Veldsman WP, Yin K, Fang X, Zhang L. LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome. Gigascience 2024; 13:giae028. [PMID: 38869148 PMCID: PMC11170215 DOI: 10.1093/gigascience/giae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/15/2024] [Accepted: 05/09/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform. FINDINGS To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots. CONCLUSIONS LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
Collapse
|
7
|
Hannan AJ. Expanding horizons of tandem repeats in biology and medicine: Why 'genomic dark matter' matters. Emerg Top Life Sci 2023; 7:ETLS20230075. [PMID: 38088823 PMCID: PMC10754335 DOI: 10.1042/etls20230075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023]
Abstract
Approximately half of the human genome includes repetitive sequences, and these DNA sequences (as well as their transcribed repetitive RNA and translated amino-acid repeat sequences) are known as the repeatome. Within this repeatome there are a couple of million tandem repeats, dispersed throughout the genome. These tandem repeats have been estimated to constitute ∼8% of the entire human genome. These tandem repeats can be located throughout exons, introns and intergenic regions, thus potentially affecting the structure and function of tandemly repetitive DNA, RNA and protein sequences. Over more than three decades, more than 60 monogenic human disorders have been found to be caused by tandem-repeat mutations. These monogenic tandem-repeat disorders include Huntington's disease, a variety of ataxias, amyotrophic lateral sclerosis and frontotemporal dementia, as well as many other neurodegenerative diseases. Furthermore, tandem-repeat disorders can include fragile X syndrome, related fragile X disorders, as well as other neurological and psychiatric disorders. However, these monogenic tandem-repeat disorders, which were discovered via their dominant or recessive modes of inheritance, may represent the 'tip of the iceberg' with respect to tandem-repeat contributions to human disorders. A previous proposal that tandem repeats may contribute to the 'missing heritability' of various common polygenic human disorders has recently been supported by a variety of new evidence. This includes genome-wide studies that associate tandem-repeat mutations with autism, schizophrenia, Parkinson's disease and various types of cancers. In this article, I will discuss how tandem-repeat mutations and polymorphisms could contribute to a wide range of common disorders, along with some of the many major challenges of tandem-repeat biology and medicine. Finally, I will discuss the potential of tandem repeats to be therapeutically targeted, so as to prevent and treat an expanding range of human disorders.
Collapse
|
8
|
Lee JY. Personalising Social Ills: An Analysis of Race-based Genomics and Personalised Medicine. JOURNAL OF LAW AND MEDICINE 2023; 30:884-898. [PMID: 38459879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
The mapping and sequencing of the human genome at the turn of the new millennium marks a pivotal reassessment of genomic science in its potential to replace traditional "one-size-fits-all" medicine with a personalised approach. The use of racial proxies in the development of pharmacogenomic products risks conflating genetics with race under the guise of alleviating health disparities. This article argues that the current genomic approaches to realising personalised medicine do not deliver on the promise for optimised health for all and may result in irreversible harm, including psychological, social and medical harm, to racial minority groups. In light of recent epigenetic findings, the article provides a reconceptualisation of the genome and race, which is necessary to understand enduring racial disparities and the cumulative effects of racial discrimination. It then addresses the need for regulatory oversight of the approval of race-based pharmacogenomic products.
Collapse
|
9
|
Tomofuji Y, Kishikawa T, Sonehara K, Maeda Y, Ogawa K, Kawabata S, Oguro-Igashira E, Okuno T, Nii T, Kinoshita M, Takagaki M, Yamamoto K, Arase N, Yagita-Sakamaki M, Hosokawa A, Motooka D, Matsumoto Y, Matsuoka H, Yoshimura M, Ohshima S, Nakamura S, Fujimoto M, Inohara H, Kishima H, Mochizuki H, Takeda K, Kumanogoh A, Okada Y. Analysis of gut microbiome, host genetics, and plasma metabolites reveals gut microbiome-host interactions in the Japanese population. Cell Rep 2023; 42:113324. [PMID: 37935197 DOI: 10.1016/j.celrep.2023.113324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 09/11/2023] [Accepted: 10/06/2023] [Indexed: 11/09/2023] Open
Abstract
Interaction between the gut microbiome and host plays a key role in human health. Here, we perform a metagenome shotgun-sequencing-based analysis of Japanese participants to reveal associations between the gut microbiome, host genetics, and plasma metabolome. A genome-wide association study (GWAS) for microbial species (n = 524) identifies associations between the PDE1C gene locus and Bacteroides intestinalis and between TGIF2 and TGIF2-RAB5IF gene loci and Bacteroides acidifiaciens. In a microbial gene ortholog GWAS, agaE and agaS, which are related to the metabolism of carbohydrates forming the blood group A antigen, are associated with blood group A in a manner depending on the secretor status determined by the East Asian-specific FUT2 variant. A microbiome-metabolome association analysis (n = 261) identifies associations between bile acids and microbial features such as bile acid metabolism gene orthologs including bai and 7β-hydroxysteroid dehydrogenase. Our publicly available data will be a useful resource for understanding gut microbiome-host interactions in an underrepresented population.
Collapse
|
10
|
Yin ZN, Lai FL, Gao F. Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis. Brief Bioinform 2023; 25:bbad432. [PMID: 38008420 PMCID: PMC10676776 DOI: 10.1093/bib/bbad432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/11/2023] [Accepted: 11/06/2023] [Indexed: 11/28/2023] Open
Abstract
Accurate identification of replication origins (ORIs) is crucial for a comprehensive investigation into the progression of human cell growth and cancer therapy. Here, we proposed a computational approach Ori-FinderH, which can efficiently and precisely predict the human ORIs of various lengths by combining the Z-curve method with deep learning approach. Compared with existing methods, Ori-FinderH exhibits superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.9616 for K562 cell line in 10-fold cross-validation. In addition, we also established a cross-cell-line predictive model, which yielded a further improved AUC of 0.9706. The model was subsequently employed as a fitness function to support genetic algorithm for generating artificial ORIs. Sequence analysis through iORI-Euk revealed that a vast majority of the created sequences, specifically 98% or more, incorporate at least one ORI for three cell lines (Hela, MCF7 and K562). This innovative approach could provide more efficient, accurate and comprehensive information for experimental investigation, thereby further advancing the development of this field.
Collapse
|
11
|
Kabata F, Thaldar D. The human genome as the common heritage of humanity. Front Genet 2023; 14:1282515. [PMID: 38028596 PMCID: PMC10662319 DOI: 10.3389/fgene.2023.1282515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
While debate on the international regulation of human genomic research remains unsettled, the Universal Declaration on the Human Genome and Human Rights, 1997 qualifies the human genome as "heritage of humankind" in a symbolic sense. Using document analysis this article assesses whether, how and to what extent the common heritage framework is relevant in regulation of human genomic research. The article traces the history of the Human Genome Project to reveal the international community's race against privatization of the human genome and its resulting qualification as the common heritage of humanity. Further, it reviews the archival records of UNESCO's International Bioethics Committee to discover the rationale for qualifying the human genome as common heritage of humankind. The article finds that the common heritage of mankind framework remains relevant to the application of the human genome at the collective level. However, the framework is at odds with the individual dimension of the human genome based on individual personality rights. The article thus argues that the right to benefit from scientific progress and its applications offers an alternative international regulatory framework for human genomic research.
Collapse
|
12
|
Zaytsev K, Fedorov A, Korotkov E. Classification of Promoter Sequences from Human Genome. Int J Mol Sci 2023; 24:12561. [PMID: 37628742 PMCID: PMC10454140 DOI: 10.3390/ijms241612561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/28/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10-8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.
Collapse
|
13
|
Sharma S, Mariño-Ramírez L, Jordan IK. Race, Ethnicity, and Pharmacogenomic Variation in the United States and the United Kingdom. Pharmaceutics 2023; 15:1923. [PMID: 37514109 PMCID: PMC10383154 DOI: 10.3390/pharmaceutics15071923] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 06/30/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
The relevance of race and ethnicity to genetics and medicine has long been a matter of debate. An emerging consensus holds that race and ethnicity are social constructs and thus poor proxies for genetic diversity. The goal of this study was to evaluate the relationship between race, ethnicity, and clinically relevant pharmacogenomic variation in cosmopolitan populations. We studied racially and ethnically diverse cohorts of 65,120 participants from the United States All of Us Research Program (All of Us) and 31,396 participants from the United Kingdom Biobank (UKB). Genome-wide patterns of pharmacogenomic variation-6311 drug response-associated variants for All of Us and 5966 variants for UKB-were analyzed with machine learning classifiers to predict participants' self-identified race and ethnicity. Pharmacogenomic variation predicts race/ethnicity with averages of 92.1% accuracy for All of Us and 94.3% accuracy for UKB. Group-specific prediction accuracies range from 99.0% for the White group in UKB to 92.9% for the Hispanic group in All of Us. Prediction accuracies are substantially lower for individuals who identified with more than one group in All of Us (16.7%) or as Mixed in UKB (70.7%). There are numerous individual pharmacogenomic variants with large allele frequency differences between race/ethnicity groups in both cohorts. Frequency differences for toxicity-associated variants predict hundreds of adverse drug reactions per 1000 treated participants for minority groups in All of Us. Our results indicate that race and ethnicity can be used to stratify pharmacogenomic risk in the US and UK populations and should not be discounted when making treatment decisions. We resolve the contradiction between the results reported here and the orthodoxy of race and ethnicity as non-genetic, social constructs by emphasizing the distinction between global and local patterns of human genetic diversity, and we stress the current and future limitations of race and ethnicity as proxies for pharmacogenomic variation.
Collapse
|
14
|
Neuhausser WM, Fouks Y, Lee SW, Macharia A, Hyun I, Adashi EY, Penzias AS, Hacker MR, Sakkas D, Vaughan D. Acceptance of genetic editing and of whole genome sequencing of human embryos by patients with infertility before and after the onset of the COVID-19 pandemic. Reprod Biomed Online 2023; 47:157-163. [PMID: 37127437 PMCID: PMC10330010 DOI: 10.1016/j.rbmo.2023.03.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 02/23/2023] [Accepted: 03/17/2023] [Indexed: 05/03/2023]
Abstract
RESEARCH QUESTION Has acceptance of heritable genome editing (HGE) and whole genome sequencing for preimplantation genetic testing (PGT-WGS) of human embryos changed after the onset of COVID-19 among infertility patients? DESIGN A written survey conducted between April and June 2018 and July and December 2021 among patients at a university-affiliated infertility practice. The questionnaire ascertained the acceptance of HGE for specific therapeutic or genetic 'enhancement' indications and of PGT-WGS to prevent adult disease. RESULTS In 2021 and 2018, 172 patients and 469 patients (response rates: 90% and 91%, respectively) completed the questionnaire. In 2021, significantly more participants reported a positive attitude towards HGE, for therapeutic and enhancement indications. In 2021 compared with 2018, respondents were more likely to use HGE to have healthy children with their own gametes (85% versus 77%), to reduce disease risk for adult-onset polygenic disorders (78% versus 67%), to increase life expectancy (55% versus 40%), intelligence (34% versus 26%) and creativity (33% versus 24%). Fifteen per cent of the 2021 group reported a more positive attitude towards HGE because of COVID-19 and less than 1% a more negative attitude. In contrast, support for PGT-WGS was similar in 2021 and 2018. CONCLUSIONS A significantly increased acceptance of HGE was observed, but not of PGT-WGS, after the onset of COVID-19. Although the pandemic may have contributed to this change, the exact reasons remain unknown and warrant further investigation. Whether increased acceptability of HGE may indicate an increase in acceptability of emerging biomedical technologies in general needs further investigation.
Collapse
|
15
|
Bastos CAC, Afreixo V, Rodrigues JMOS, Pinho AJ. Concentration of inverted repeats along human DNA. J Integr Bioinform 2023; 20:jib-2022-0052. [PMID: 37486620 PMCID: PMC10561070 DOI: 10.1515/jib-2022-0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 02/27/2023] [Indexed: 07/25/2023] Open
Abstract
This work aims to describe the observed enrichment of inverted repeats in the human genome; and to identify and describe, with detailed length profiles, the regions with significant and relevant enriched occurrence of inverted repeats. The enrichment is assessed and tested with a recently proposed measure (z-scores based measure). We simulate a genome using an order 7 Markov model trained with the data from the real genome. The simulated genome is used to establish the critical values which are used as decision thresholds to identify the regions with significant enriched concentrations. Several human genome regions are highly enriched in the occurrence of inverted repeats. This is observed in all the human chromosomes. The distribution of inverted repeat lengths varies along the genome. The majority of the regions with severely exaggerated enrichment contain mainly short length inverted repeats. There are also regions with regular peaks along the inverted repeats lengths distribution (periodic regularities) and other regions with exaggerated enrichment for long lengths (less frequent). However, adjacent regions tend to have similar distributions.
Collapse
|
16
|
Barry CJS, Walker VM, Cheesman R, Davey Smith G, Morris TT, Davies NM. How to estimate heritability: a guide for genetic epidemiologists. Int J Epidemiol 2023; 52:624-632. [PMID: 36427280 PMCID: PMC10114051 DOI: 10.1093/ije/dyac224] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 11/14/2022] [Indexed: 11/27/2022] Open
Abstract
Traditionally, heritability has been estimated using family-based methods such as twin studies. Advancements in molecular genomics have facilitated the development of methods that use large samples of (unrelated or related) genotyped individuals. Here, we provide an overview of common methods applied in genetic epidemiology to estimate heritability, i.e. the proportion of phenotypic variation explained by genetic variation. We provide a guide to key genetic concepts required to understand heritability estimation methods from family-based designs (twin and family studies), genomic designs based on unrelated individuals [linkage disequilibrium score regression, genomic relatedness restricted maximum-likelihood (GREML) estimation] and family-based genomic designs (sibling regression, GREML-kinship, trio-genome-wide complex trait analysis, maternal-genome-wide complex trait analysis, relatedness disequilibrium regression). We describe how heritability is estimated for each method and the assumptions underlying its estimation, and discuss the implications when these assumptions are not met. We further discuss the benefits and limitations of estimating heritability within samples of unrelated individuals compared with samples of related individuals. Overall, this article is intended to help the reader determine the circumstances when each method would be appropriate and why.
Collapse
|
17
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
Background: The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
|
18
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 04/25/2023] Open
Abstract
Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
|
19
|
Hanany M, Yang RR, Lam CM, Beryozkin A, Sundaresan Y, Sharon D. An In-Depth Single-Gene Worldwide Carrier Frequency and Genetic Prevalence Analysis of CYP4V2 as the Cause of Bietti Crystalline Dystrophy. Transl Vis Sci Technol 2023; 12:27. [PMID: 36795063 PMCID: PMC9940774 DOI: 10.1167/tvst.12.2.27] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023] Open
Abstract
Conclusions Our analysis estimates BCD prevalence and revealed large differences among various populations. Moreover, it highlights advantages and limitations of the gnomAD database. Methods CYP4V2 gnomAD data and reported mutations were used to calculate carrier frequency of each variant. An evolutionary-based sliding window analysis was used to detect conserved protein regions. Potential exonic splicing enhancers (ESEs) were identified using ESEfinder. Purpose Bietti crystalline dystrophy (BCD) is a rare monogenic autosomal recessive (AR) chorioretinal degenerative disease caused by biallelic mutations in CYP4V2. The aim of the current study was to perform an in-depth calculation of worldwide carrier frequency and genetic prevalence of BCD using gnomAD data and comprehensive literature CYP4V2 analysis. Results We identified 1171 CYP4V2 variants, 156 of which were considered pathogenic, including 108 reported in patients with BCD. Carrier frequency and genetic prevalence calculations confirmed that BCD is more common in the East Asian population, with ∼19 million healthy carriers and 52,000 individuals who carry biallelic CYP4V2 mutations and are expected to be affected. Additionally, we generated BCD prevalence estimates of other populations, including African, European, Finnish, Latino, and South Asian. Worldwide, the estimated overall carrier frequency of CYP4V2 mutation is 1:210, and therefore, ∼37 million individuals are expected to be healthy carriers of a CYP4V2 mutation. The estimated genetic prevalence of BCD is about 1:116,000, and we predict that ∼67,000 individuals are affected with BCD worldwide. Translational Relevance This analysis is likely to have important implications for genetic counseling in each studied population and for developing clinical trials for potential BCD treatments.
Collapse
|
20
|
Kravitz SN, Ferris E, Love MI, Thomas A, Quinlan AR, Gregg C. Random allelic expression in the adult human body. Cell Rep 2023; 42:111945. [PMID: 36640362 PMCID: PMC10484211 DOI: 10.1016/j.celrep.2022.111945] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/17/2022] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
Genes are typically assumed to express both parental alleles similarly, yet cell lines show random allelic expression (RAE) for many autosomal genes that could shape genetic effects. Thus, understanding RAE in human tissues could improve our understanding of phenotypic variation. Here, we develop a methodology to perform genome-wide profiling of RAE and biallelic expression in GTEx datasets for 832 people and 54 tissues. We report 2,762 autosomal genes with some RAE properties similar to randomly inactivated X-linked genes. We found that RAE is associated with rapidly evolving regions in the human genome, adaptive signaling processes, and genes linked to age-related diseases such as neurodegeneration and cancer. We define putative mechanistic subtypes of RAE distinguished by gene overlaps on sense and antisense DNA strands, aggregation in clusters near telomeres, and increased regulatory complexity and inputs compared with biallelic genes. We provide foundations to study RAE in human phenotypes, evolution, and disease.
Collapse
|
21
|
Dvorak P, Hanicinec V, Soucek P. The position of the longest intron is related to biological functions in some human genes. Front Genet 2023; 13:1085139. [PMID: 36712854 PMCID: PMC9875286 DOI: 10.3389/fgene.2022.1085139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023] Open
Abstract
The evidence that introns can influence different levels of transfer of genetic information between DNA and the final product is increasing. Longer first introns were found to be a general property of eukaryotic gene structure and shown to contain a higher fraction of conserved sequence and different functional elements. Our work brings more precise information about the position of the longest introns in human protein-coding genes and possible connection with biological function and gene expression. According to our results, the position of the longest intron can be localized to the first third of introns in 64%, the second third in 19%, and the third in 17%, with notable peaks at the middle and last introns of approximately 5% and 6%, respectively. The median lengths of the longest introns decrease with increasing distance from the start of the gene from approximately 15,000 to 5,000 bp. We have shown that the position of the longest intron is in some cases linked to the biological function of the given gene. For example, DNA repair genes have the longest intron more often in the second or third. In the distribution of gene expression according to the position of the longest intron, tissue-specific profiles can be traced with the highest expression usually at the absolute positions of intron 1 and 2. In this work, we present arguments supporting the hypothesis that the position of the longest intron in a gene is another biological factor modulating the transmission of genetic information. The position of the longest intron is related to biological functions in some human genes.
Collapse
|
22
|
Ilgisonis EV, Ponomarenko EA, Tarbeeva SN, Lisitsa AV, Zgoda VG, Radko SP, Archakov AI. Gene-centric coverage of the human liver transcriptome: QPCR, Illumina, and Oxford Nanopore RNA-Seq. Front Mol Biosci 2022; 9:944639. [PMID: 36545510 PMCID: PMC9760921 DOI: 10.3389/fmolb.2022.944639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 11/10/2022] [Indexed: 12/07/2022] Open
Abstract
It has been shown that the best coverage of the HepG2 cell line transcriptome encoded by genes of a single chromosome, chromosome 18, is achieved by a combination of two sequencing platforms, Illumina RNA-Seq and Oxford Nanopore Technologies (ONT), using cut-off levels of FPKM > 0 and TPM > 0, respectively. In this study, we investigated the extent to which the combination of these transcriptomic analysis methods makes it possible to achieve a high coverage of the transcriptome encoded by the genes of other human chromosomes. A comparative analysis of transcriptome coverage for various types of biological material was carried out, and the HepG2 cell line transcriptome was compared with the transcriptome of liver tissue cells. In addition, the contribution of variability in the coverage of expressed genes in human transcriptomes to the creation of a draft human transcriptome was evaluated. For human liver tissues, ONT makes an extremely insignificant contribution to the overall coverage of the transcriptome. Thus, to ensure maximum coverage of the liver tissue transcriptome, it is sufficient to apply only one technology: Illumina RNA-Seq (FPKM > 0).
Collapse
|
23
|
Thaldar DW, Townsend BA, Donnelly DL, Botes M, Gooden A, van Harmelen J, Shozi B. The multidimensional legal nature of personal genomic sequence data: A South African perspective. Front Genet 2022; 13:997595. [PMID: 36437942 PMCID: PMC9681828 DOI: 10.3389/fgene.2022.997595] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/28/2022] [Indexed: 10/19/2023] Open
Abstract
This article provides a comprehensive analysis of the various dimensions in South African law applicable to personal genomic sequence data. This analysis includes property rights, personality rights, and intellectual property rights. Importantly, the under-investigated question of whether personal genomic sequence data are capable of being owned is investigated and answered affirmatively. In addition to being susceptible of ownership, personal genomic sequence data are also the object of data subjects' personality rights, and can also be the object of intellectual property rights: whether on their own qua trade secret or as part of a patented invention or copyrighted dataset. It is shown that personality rights constrain ownership rights, while the exploitation of intellectual property rights is constrained by both personality rights and ownership rights. All of these rights applicable to personal genomic sequence data should be acknowledged and harmonized for such data to be used effectively.
Collapse
|
24
|
Yangyanqiu W, Shuwen H. Bacterial DNA involvement in carcinogenesis. Front Cell Infect Microbiol 2022; 12:996778. [PMID: 36310856 PMCID: PMC9600336 DOI: 10.3389/fcimb.2022.996778] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/27/2022] [Indexed: 10/29/2023] Open
Abstract
The incidence of cancer is high worldwide, and biological factors such as viruses and bacteria play an important role in the occurrence of cancer. Helicobacter pylori, human papillomavirus, hepatitis B viruses and other organisms have been identified as carcinogens. Cancer is a disease driven by the accumulation of genome changes. Viruses can directly cause cancer by changing the genetic composition of the human body, such as cervical cancer caused by human papillomavirus DNA integration and liver cancer caused by hepatitis B virus DNA integration. Recently, bacterial DNA has been found around cancers such as pancreatic cancer, breast cancer and colorectal cancer, and the idea that bacterial genes can also be integrated into the human genome has become a hot topic. In the present paper, we reviewed the latest phenomenon and specific integration mechanism of bacterial DNA into the human genome. Based on these findings, we also suggest three sources of bacterial DNA in cancers: bacterial DNA around human tissues, free bacterial DNA in bacteremia or sepsis, and endogenous bacterial DNA in the human genome. Clarifying the theory that bacterial DNA integrates into the human genome can provide a new perspective for cancer prevention and treatment.
Collapse
|
25
|
Silva C, Machado M, Ferrão J, Sebastião Rodrigues A, Vieira L. Whole human genome 5'-mC methylation analysis using long read nanopore sequencing. Epigenetics 2022; 17:1961-1975. [PMID: 35856633 DOI: 10.1080/15592294.2022.2097473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Methylation microarray and bisulphite sequencing are often used to study 5'-methylcytosine (5'-mC) modification of CpG dinucleotides in the human genome. Although both technologies produce trustworthy results, the evaluation of the methylation status of CpG sites suffers from the potential side effects of DNA modification by bisulphite and/or the ambiguity of mapping short reads in repetitive and highly homologous genomic regions, respectively. Nanopore sequencing is an attractive alternative for the study of 5'-mC since it allows sequencing of native DNA molecules, whereas the long reads produced by this technology help to increase the resolution of those genomic regions. In this work, we show that nanopore sequencing with 10X coverage depth, using DNA from a human cell line, produces 5'-mC methylation frequencies consistent with those obtained by 450k microarray, digital restriction enzyme analysis of methylation, and reduced representation bisulphite sequencing. High correlation between methylation frequencies obtained by nanopore sequencing and the other methodologies was also noticeable in either low or high GC content regions, including CpG islands and transcription start sites. We also showed that a minimum of five reads per CpG yields strong correlations (>0.89) in replicate nanopore sequencing runs and an almost uniform linearity of the methylation frequency variation between zero and one. Furthermore, nanopore sequencing was able to correctly display methylation frequency patterns based on genomic annotations of CpG regions. These results demonstrate that nanopore sequencing is a fast, robust, and reliable approach to the study of 5'-mC in the human genome with low coverage depth.
Collapse
|