1
|
Zeng A, Rong H, Pan D, Jia L, Zhang Y, Zhao F, Peng S. Discovery of Genetic Biomarkers for Alzheimer's Disease Using Adaptive Convolutional Neural Networks Ensemble and Genome-Wide Association Studies. Interdiscip Sci 2021; 13:787-800. [PMID: 34410590 DOI: 10.1007/s12539-021-00470-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 07/01/2021] [Accepted: 08/01/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE To identify candidate neuroimaging and genetic biomarkers for Alzheimer's disease (AD) and other brain disorders, especially for little-investigated brain diseases, we advocate a data-driven approach which incorporates an adaptive classifier ensemble model acquired by integrating Convolutional Neural Network (CNN) and Ensemble Learning (EL) with Genetic Algorithm (GA), i.e., the CNN-EL-GA method, into Genome-Wide Association Studies (GWAS). METHODS Above all, a large number of CNN models as base classifiers were trained using coronal, sagittal, or transverse magnetic resonance imaging slices, respectively, and the CNN models with strong discriminability were then selected to build a single classifier ensemble with the GA for classifying AD, with the help of the CNN-EL-GA method. While the acquired classifier ensemble exhibited the highest generalization capability, the points of intersection were determined with the most discriminative coronal, sagittal, and transverse slices. Finally, we conducted GWAS on the genotype data and the phenotypes, i.e., the gray matter volumes of the top ten most discriminative brain regions, which contained the ten most points of intersection. RESULTS Six genes of PCDH11X/Y, TPTE2, LOC107985902, MUC16 and LINC01621 as well as Single-Nucleotide Polymorphisms, e.g., rs36088804, rs34640393, rs2451078, rs10496214, rs17016520, rs2591597, rs9352767 and rs5941380, were identified. CONCLUSION This approach overcomes the limitations associated with the impact of subjective factors and dependence on prior knowledge while adaptively achieving more robust and effective candidate biomarkers in a data-driven way. SIGNIFICANCE The approach is promising to facilitate discovering effective candidate genetic biomarkers for brain disorders, as well as to help improve the effectiveness of identified candidate neuroimaging biomarkers for brain diseases.
Collapse
Affiliation(s)
- An Zeng
- Faculty of Computer, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Huabin Rong
- Faculty of Computer, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Dan Pan
- School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, 510665, People's Republic of China.
| | - Longfei Jia
- Faculty of Computer, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Yiqun Zhang
- Faculty of Computer, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Fengyi Zhao
- Faculty of Computer, Guangdong University of Technology, Guangzhou, 510006, People's Republic of China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, School of Computer Science, National University of Defense Technology, Peng Cheng Lab, Shenzhen, 518000, People's Republic of China.
| |
Collapse
|
2
|
Käfer J, Lartillot N, Marais GAB, Picard F. Detecting sex-linked genes using genotyped individuals sampled in natural populations. Genetics 2021; 218:iyab053. [PMID: 33764439 PMCID: PMC8225351 DOI: 10.1093/genetics/iyab053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 03/21/2021] [Indexed: 12/20/2022] Open
Abstract
We propose a method, SDpop, able to infer sex-linkage caused by recombination suppression typical of sex chromosomes. The method is based on the modeling of the allele and genotype frequencies of individuals of known sex in natural populations. It is implemented in a hierarchical probabilistic framework, accounting for different sources of error. It allows statistical testing for the presence or absence of sex chromosomes, and detection of sex-linked genes based on the posterior probabilities in the model. Furthermore, for gametologous sequences, the haplotype and level of nucleotide polymorphism of each copy can be inferred, as well as the divergence between them. We test the method using simulated data, as well as data from both a relatively recent and an old sex chromosome system (the plant Silene latifolia and humans) and show that, for most cases, robust predictions are obtained with 5 to 10 individuals per sex.
Collapse
Affiliation(s)
- Jos Käfer
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR 5558, Université Lyon 1, Université de Lyon, Villeurbanne F-69622, France
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR 5558, Université Lyon 1, Université de Lyon, Villeurbanne F-69622, France
| | - Gabriel A B Marais
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR 5558, Université Lyon 1, Université de Lyon, Villeurbanne F-69622, France
| | - Franck Picard
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR 5558, Université Lyon 1, Université de Lyon, Villeurbanne F-69622, France
| |
Collapse
|
3
|
Nazarian A, Kulminski AM. Genome-Wide Analysis of Sex Disparities in the Genetic Architecture of Lung and Colorectal Cancers. Genes (Basel) 2021; 12:genes12050686. [PMID: 34062886 PMCID: PMC8147355 DOI: 10.3390/genes12050686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/28/2021] [Accepted: 04/29/2021] [Indexed: 12/24/2022] Open
Abstract
Almost all complex disorders have manifested epidemiological and clinical sex disparities which might partially arise from sex-specific genetic mechanisms. Addressing such differences can be important from a precision medicine perspective which aims to make medical interventions more personalized and effective. We investigated sex-specific genetic associations with colorectal (CRCa) and lung (LCa) cancers using genome-wide single-nucleotide polymorphisms (SNPs) data from three independent datasets. The genome-wide association analyses revealed that 33 SNPs were associated with CRCa/LCa at P < 5.0 × 10−6 neither males or females. Of these, 26 SNPs had sex-specific effects as their effect sizes were statistically different between the two sexes at a Bonferroni-adjusted significance level of 0.0015. None had proxy SNPs within their ±1 Mb regions and the closest genes to 32 SNPs were not previously associated with the corresponding cancers. The pathway enrichment analyses demonstrated the associations of 35 pathways with CRCa or LCa which were mostly implicated in immune system responses, cell cycle, and chromosome stability. The significant pathways were mostly enriched in either males or females. Our findings provided novel insights into the potential sex-specific genetic heterogeneity of CRCa and LCa at SNP and pathway levels.
Collapse
|
4
|
Abstract
Effects of stresses associated with extremely preterm birth may be biologically "recorded" in the genomes of individuals born preterm via changes in DNA methylation (DNAm) patterns. Genome-wide DNAm profiles were examined in buccal epithelial cells from 45 adults born at extremely low birth weight (ELBW; ≤1000 g) in the oldest known cohort of prospectively followed ELBW survivors (Mage = 32.35 years, 17 male), and 47 normal birth weight (NBW; ≥2500 g) control adults (Mage = 32.43 years, 20 male). Sex differences in DNAm profiles were found in both birth weight groups, but they were greatly enhanced in the ELBW group (77,895 loci) versus the NBW group (3,424 loci), suggesting synergistic effects of extreme prenatal adversity and sex on adult DNAm profiles. In men, DNAm profiles differed by birth weight group at 1,354 loci on 694 unique genes. Only two loci on two genes distinguished between ELBW and NBW women. Gene ontology (GO) and network analyses indicated that loci differentiating between ELBW and NBW men were abundant in genes within biological pathways related to neuronal development, synaptic transportation, metabolic regulation, and cellular regulation. Findings suggest increased sensitivity of males to long-term epigenetic effects of extremely preterm birth. Group differences are discussed in relation to particular gene functions.
Collapse
|
5
|
Cheng C, Kirkpatrick M. The signal of sex-specific selection in humans is not an artefact: Reply to Mank et al. Mol Ecol 2020; 29:1406-1407. [PMID: 32338415 DOI: 10.1111/mec.15420] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 03/16/2020] [Accepted: 03/19/2020] [Indexed: 02/06/2023]
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude's Children's Hospital, Memphis, TN, USA
| | - Mark Kirkpatrick
- Department of Integrative Biology, University of Texas, Austin, TX, USA
| |
Collapse
|
6
|
Genetic analysis of hsCRP in American Indians: The Strong Heart Family Study. PLoS One 2019; 14:e0223574. [PMID: 31622379 PMCID: PMC6797125 DOI: 10.1371/journal.pone.0223574] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/24/2019] [Indexed: 02/07/2023] Open
Abstract
Background Increased serum levels of C-reactive protein (CRP), an important component of the innate immune response, are associated with increased risk of cardiovascular disease (CVD). Multiple single nucleotide polymorphisms (SNP) have been identified which are associated with CRP levels, and Mendelian randomization studies have shown a positive association between SNPs increasing CRP expression and risk of colon cancer (but thus far not CVD). The effects of individual genetic variants often interact with the genetic background of a population and hence we sought to resolve the genetic determinants of serum CRP in a number of American Indian populations. Methods The Strong Heart Family Study (SHFS) has serum CRP measurements from 2428 tribal members, recruited as large families from three regions of the United States. Microsatellite markers and MetaboChip defined SNP genotypes were incorporated into variance components, decomposition-based linkage and association analyses. Results CRP levels exhibited significant heritability (h2 = 0.33 ± 0.05, p<1.3 X 10−20). A locus on chromosome (chr) 6, near marker D6S281 (approximately at 169.6 Mb, GRCh38/hg38) showed suggestive linkage (LOD = 1.9) to CRP levels. No individual SNPs were found associated with CRP levels after Bonferroni adjustment for multiple testing (threshold <7.77 x 10−7), however, we found nominal associations, many of which replicate previous findings at the CRP, HNF1A and 7 other loci. In addition, we report association of 46 SNPs located at 7 novel loci on chromosomes 2, 5, 6(2 loci), 9, 10 and 17, with an average of 15.3 Kb between SNPs and all with p-values less than 7.2 X 10−4. Conclusion In agreement with evidence from other populations, these data show CRP serum levels are under considerable genetic influence; and include loci, such as near CRP and other genes, that replicate results from other ethnic groups. These findings also suggest possible novel loci on chr 6 and other chromosomes that warrant further investigation.
Collapse
|
7
|
Ye J, Niu X, Yang Y, Wang S, Xu Q, Yuan X, Yu H, Wang Y, Wang S, Feng Y, Wei X. Divergent Hd1, Ghd7, and DTH7 Alleles Control Heading Date and Yield Potential of Japonica Rice in Northeast China. FRONTIERS IN PLANT SCIENCE 2018; 9:35. [PMID: 29434613 PMCID: PMC5790996 DOI: 10.3389/fpls.2018.00035] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 01/09/2018] [Indexed: 05/04/2023]
Abstract
The heading date is a vital factor in achieving a full rice yield. Cultivars with particular flowering behaviors have been artificially selected to survive in the long-day and low-temperature conditions of Northeast China. To dissect the genetic mechanism responsible for heading date in rice populations from Northeast China, association mapping was performed to identify major controlling loci. A genome-wide association study (GWAS) identified three genetic loci, Hd1, Ghd7, and DTH7, using general and mixed linear models. The three genes were sequenced to analyze natural variations and identify their functions. Loss-of-function alleles of these genes contributed to early rice heading dates in the northern regions of Northeast China, while functional alleles promoted late rice heading dates in the southern regions of Northeast China. Selecting environmentally appropriate allele combinations in new varieties is recommended during breeding. Introducing the early indica rice's genetic background into Northeast japonica rice is a reasonable strategy for improving genetic diversity.
Collapse
Affiliation(s)
- Jing Ye
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
- College of Agronomy, Shenyang Agricultural University, Shenyang, China
| | - Xiaojun Niu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Yaolong Yang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Shan Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Qun Xu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Xiaoping Yuan
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Hanyong Yu
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Yiping Wang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
| | - Shu Wang
- College of Agronomy, Shenyang Agricultural University, Shenyang, China
- *Correspondence: Xinghua Wei, Yue Feng, Shu Wang,
| | - Yue Feng
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
- *Correspondence: Xinghua Wei, Yue Feng, Shu Wang,
| | - Xinghua Wei
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, China
- *Correspondence: Xinghua Wei, Yue Feng, Shu Wang,
| |
Collapse
|
8
|
Du Y, Martin JS, McGee J, Yang Y, Liu EY, Sun Y, Geihs M, Kong X, Zhou EL, Li Y, Huang J. A SNP panel and online tool for checking genotype concordance through comparing QR codes. PLoS One 2017; 12:e0182438. [PMID: 28926565 PMCID: PMC5604942 DOI: 10.1371/journal.pone.0182438] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 07/18/2017] [Indexed: 01/02/2023] Open
Abstract
In the current precision medicine era, more and more samples get genotyped and sequenced. Both researchers and commercial companies expend significant time and resources to reduce the error rate. However, it has been reported that there is a sample mix-up rate of between 0.1% and 1%, not to mention the possibly higher mix-up rate during the down-stream genetic reporting processes. Even on the low end of this estimate, this translates to a significant number of mislabeled samples, especially over the projected one billion people that will be sequenced within the next decade. Here, we first describe a method to identify a small set of Single nucleotide polymorphisms (SNPs) that can uniquely identify a personal genome, which utilizes allele frequencies of five major continental populations reported in the 1000 genomes project and the ExAC Consortium. To make this panel more informative, we added four SNPs that are commonly used to predict ABO blood type, and another two SNPs that are capable of predicting sex. We then implement a web interface (http://qrcme.tech), nicknamed QRC (forQR code based Concordance check), which is capable of extracting the relevant ID SNPs from a raw genetic data, coding its genotype as a quick response (QR) code, and comparing QR codes to report the concordance of underlying genetic datasets. The resulting 80 fingerprinting SNPs represent a significant decrease in complexity and the number of markers used for genetic data labelling and tracking. Our method and web tool is easily accessible to both researchers and the general public who consider the accuracy of complex genetic data as a prerequisite towards precision medicine.
Collapse
Affiliation(s)
- Yonghong Du
- School of Statistics, Beijing Normal University, Beijing, China
| | - Joshua S. Martin
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - John McGee
- NC Translational and Clinical Sciences Institute, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yuchen Yang
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Eric Yi Liu
- Department of Computer Science, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yingrui Sun
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts, United States of America
| | - Matthias Geihs
- Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Xuejun Kong
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Eric Lingfeng Zhou
- Department of Biostatistics, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Computer Science, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Biostatistics, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (YL); (JH)
| | - Jie Huang
- Boston VA Research Institute, Boston, Massachusetts, United States of America
- Brigham Women’s Hospital Division of Aging, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (YL); (JH)
| |
Collapse
|
9
|
Arthur JW, Cheung FSG, Reichardt JKV. Single nucleotide differences (SNDs) continue to contaminate the dbSNP database with consequences for human genomics and health. Hum Mutat 2015; 36:196-9. [PMID: 25421747 DOI: 10.1002/humu.22735] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 11/17/2014] [Indexed: 01/31/2023]
Abstract
It has been established that up to 8.3% of the biallelic coding SNPs present in dbSNP are actually artefactual polymorphism-like errors, previously termed single nucleotide differences, or SNDs. In this study, a previous analysis of SNPs in dbSNP was extended and updated to examine how the incidence of SNDs has changed over an intervening five year period. The incidence of SNDs was found to be lower than in the previous analysis at 2.2% of all biallelic SNPs. There was only a modest reduction in the percentage of SNDs in the original set of biallelic coding SNPs tested. This suggests that the overall reduction in the incidence of SNDs over the intervening 5-year period is related to an improvement in SNP detection methods and more rigorous curation, rather than efforts to ameliorate the presence of SNDs. We note that SNDs contaminating the dbSNP may lead to erroneous conclusions on human conditions.
Collapse
Affiliation(s)
- Jonathan W Arthur
- Children's Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia
| | | | | |
Collapse
|
10
|
Forster M, Forster P, Elsharawy A, Hemmrich G, Kreck B, Wittig M, Thomsen I, Stade B, Barann M, Ellinghaus D, Petersen BS, May S, Melum E, Schilhabel MB, Keller A, Schreiber S, Rosenstiel P, Franke A. From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software. Nucleic Acids Res 2012; 41:e16. [PMID: 22965131 PMCID: PMC3592472 DOI: 10.1093/nar/gks836] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5-60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences.
Collapse
Affiliation(s)
- Michael Forster
- Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, D-24105 Kiel, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|