1
|
Mansueto L, Tandayu E, Mieog J, Garcia-de Heer L, Das R, Burn A, Mauleon R, Kretzschmar T. HASCH - A high-throughput amplicon-based SNP-platform for medicinal cannabis and industrial hemp genotyping applications. BMC Genomics 2024; 25:818. [PMID: 39210290 PMCID: PMC11363669 DOI: 10.1186/s12864-024-10734-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Cannabis sativa is seeing a global resurgence as a food, fiber and medicinal crop for industrial hemp and medicinal Cannabis industries respectively. However, a widespread moratorium on the use and research of C. sativa throughout most of the 20th century has seen the development of improved cultivars for specific end uses lag behind that of conventional crops. While C. sativa research and development has seen significant investments in the recent past, resulting in a suite of publicly available genomic resources and tools, a versatile and cost-effective mid-density genotyping platform for applied purposes in breeding and pre-breeding is lacking. Here we report on a first mid-density fixed-target SNP platform for C. sativa. RESULTS The High-throughput Amplicon-based SNP-platform for medicinal Cannabis and industrial Hemp (HASCH) was designed using a combination of filtering and Integer Linear Programming on publicly available whole-genome sequencing and RNA sequencing data, supplemented with in-house generated genotyping-by-sequencing (GBS) data. HASCH contains 1,504 genome-wide targets of high call rate (97% mean) and even distribution across the genome, designed to be highly informative (> 0.3 minor allele frequency) across both medicinal cannabis and industrial hemp gene pools. Average numbers of mismatch SNP between any two accessions were 251 for medicinal cannabis (N = 116) and 272 for industrial hemp (N = 87). Comparing HASCH data with corresponding GBS data on a collection of diverse C. sativa accessions demonstrated high concordance and resulted in comparable phylogenies and genetic distance matrices. Using HASCH on a segregating F2 population derived from a cross between a tetrahydrocannabinol (THC)-dominant and a cannabidiol (CBD)-dominant accession resulted in a genetic map consisting of 310 markers, comprising 10 linkage groups and a total size of 582.7 cM. Quantitative Trait Locus (QTL) mapping identified a major QTL for CBD content on chromosome 7, consistent with previous findings. CONCLUSION HASCH constitutes a versatile, easy to use and cost-effective genotyping solution for the rapidly growing Cannabis research community. It provides consistent genetic fingerprints of 1504 SNPs with wide applicability genetic resource management, quantitative genetics and breeding.
Collapse
Affiliation(s)
- Locedie Mansueto
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Erwin Tandayu
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Jos Mieog
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Lennard Garcia-de Heer
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Rekhamani Das
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Adam Burn
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
| | - Ramil Mauleon
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia
- International Rice Research Institute, Pili Drive, Los Banos, Laguna, Philippines
| | - Tobias Kretzschmar
- Southern Cross Plant Science, Faculty of Science and Engineering, Southern Cross University, 1 Military Road, East Lismore, NSW, 2480, Australia.
| |
Collapse
|
2
|
Leite Filho HP, Pinto IP, Oliveira LG, Costa EOA, da Cruz AS, e Silva DDM, da Silva CC, Caetano AR, da Cruz AD. Deviation from Mendelian transmission of autosomal SNPs can be used to estimate germline mutations in humans exposed to ionizing radiation. PLoS One 2020; 15:e0233941. [PMID: 33108378 PMCID: PMC7591025 DOI: 10.1371/journal.pone.0233941] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 10/09/2020] [Indexed: 11/18/2022] Open
Abstract
We aimed to estimate the rate of germline mutations in the offspring of individuals accidentally exposed to Cesium-137 ionizing radiation. The study included two distinct groups: one of cases, consisting of males and females accidentally exposed to low doses of ionizing radiation of Cs137, and a control group of non-exposed participants. The cases included 37 people representing 11 families and 15 children conceived after the accident. Exposed families incurred radiation absorbed doses in the range of 0.2 to 0.5 Gray. The control group included 15 families and 15 children also conceived after 1987 in Goiânia with no history of radiation exposure. DNA samples from peripheral blood were analyzed with the Affymetrix GeneChip® CytoScanHD™ to estimate point mutations in autosomal SNPs. A set of scripts previously developed was used to detect de novo mutations by comparing parent and offspring genotypes at the level of each SNP marker. Overall numbers of observed Mendelian deviations were statistically significant between the exposed and control groups. Our retrospective transgenerational DNA analysis showed a 44.0% increase in the burden of SNP mutations in the offspring of cases when compared to controls, based on the average of MFMD for the two groups. Parent-of-origin and type of nucleotide substitution were also inferred. This proved useful in a retrospective estimation of the rate of de novo germline mutations in a human population accidentally exposed to low doses of radiation from Cesium-137. Our results suggested that observed burden of germline mutations identified in offspring was a potentially useful biomarker of effect to estimate parental exposure to low doses of IR and could become an important marker suitable for biomonitoring human population exposed to environmental mutagens.
Collapse
Affiliation(s)
- Hugo Pereira Leite Filho
- Programa de Pós-Graduação em Biotecnologia e Biodiversidade, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
- Universidade Estadual de Goiás, Anápolis, Goiás, Brazil
| | - Irene Plaza Pinto
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
| | - Lorraynne Guimarães Oliveira
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
| | - Emília Oliveira Alves Costa
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
| | - Alex Silva da Cruz
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
| | - Daniela de Melo e Silva
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
| | - Claudio Carlos da Silva
- Programa de Pós-Graduação em Biotecnologia e Biodiversidade, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
- Universidade Estadual de Goiás, Anápolis, Goiás, Brazil
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
- Laboratório de Genética Molecular e Citogenética Humana, Laboratório Estadual de Saúde Pública Dr. Giovanni Cysneiros, Secretaria de Saúde Pública do Estado de Goiás, Goiânia, Goiás, Brazil
| | | | - Aparecido Divino da Cruz
- Programa de Pós-Graduação em Biotecnologia e Biodiversidade, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
- Núcleo de Pesquisa Replicon, Mestrado em Genética, Escola de Ciências Agrárias e Biológicas, Pontíficia Universidade Católica de Goiás, Goiânia, Goiás, Brazil
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
- Laboratório de Genética Molecular e Citogenética Humana, Laboratório Estadual de Saúde Pública Dr. Giovanni Cysneiros, Secretaria de Saúde Pública do Estado de Goiás, Goiânia, Goiás, Brazil
| |
Collapse
|
3
|
Seyed Tabib NS, Madgwick M, Sudhakar P, Verstockt B, Korcsmaros T, Vermeire S. Big data in IBD: big progress for clinical practice. Gut 2020; 69:1520-1532. [PMID: 32111636 PMCID: PMC7398484 DOI: 10.1136/gutjnl-2019-320065] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 02/05/2020] [Accepted: 02/06/2020] [Indexed: 12/12/2022]
Abstract
IBD is a complex multifactorial inflammatory disease of the gut driven by extrinsic and intrinsic factors, including host genetics, the immune system, environmental factors and the gut microbiome. Technological advancements such as next-generation sequencing, high-throughput omics data generation and molecular networks have catalysed IBD research. The advent of artificial intelligence, in particular, machine learning, and systems biology has opened the avenue for the efficient integration and interpretation of big datasets for discovering clinically translatable knowledge. In this narrative review, we discuss how big data integration and machine learning have been applied to translational IBD research. Approaches such as machine learning may enable patient stratification, prediction of disease progression and therapy responses for fine-tuning treatment options with positive impacts on cost, health and safety. We also outline the challenges and opportunities presented by machine learning and big data in clinical IBD research.
Collapse
Affiliation(s)
| | - Matthew Madgwick
- Organisms and Ecosystems, Earlham Institute, Norwich, UK
- Gut microbes in health and disease, Quadram Institute Bioscience, Norwich, UK
| | - Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, TARGID, KU Leuven, Leuven, Belgium
- Organisms and Ecosystems, Earlham Institute, Norwich, UK
- Gut microbes in health and disease, Quadram Institute Bioscience, Norwich, UK
| | - Bram Verstockt
- Translational Research in GastroIntestinal Disorders, KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, KU Leuven University Hospitals Leuven, Leuven, Belgium
| | - Tamas Korcsmaros
- Organisms and Ecosystems, Earlham Institute, Norwich, UK
- Gut microbes in health and disease, Quadram Institute Bioscience, Norwich, UK
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, TARGID, KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, KU Leuven University Hospitals Leuven, Leuven, Belgium
| |
Collapse
|
4
|
Chen Q, Meng Z, Su R. WERFE: A Gene Selection Algorithm Based on Recursive Feature Elimination and Ensemble Strategy. Front Bioeng Biotechnol 2020; 8:496. [PMID: 32548100 PMCID: PMC7270206 DOI: 10.3389/fbioe.2020.00496] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
Gene selection algorithm in micro-array data classification problem finds a small set of genes which are most informative and distinctive. A well-performed gene selection algorithm should pick a set of genes that achieve high performance and the size of this gene set should be as small as possible. Many of the existing gene selection algorithms suffer from either low performance or large size. In this study, we propose a wrapper gene selection approach, named WERFE, within a recursive feature elimination (RFE) framework to make the classification more efficient. This WERFE employs an ensemble strategy, takes advantages of a variety of gene selection methods and assembles the top selected genes in each approach as the final gene subset. By integrating multiple gene selection algorithms, the optimal gene subset is determined through prioritizing the more important genes selected by each gene selection method and a more discriminative and compact gene subset can be selected. Experimental results show that the proposed method can achieve state-of-the-art performance.
Collapse
Affiliation(s)
- Qi Chen
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Military Transportation Command Department, Army Military Transportation University, Tianjin, China
| | - Zhaopeng Meng
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, China
| |
Collapse
|
5
|
Mohino-Herranz I, Gil-Pita R, García-Gómez J, Rosa-Zurera M, Seoane F. A Wrapper Feature Selection Algorithm: An Emotional Assessment Using Physiological Recordings from Wearable Sensors. SENSORS 2020; 20:s20010309. [PMID: 31935893 PMCID: PMC6983098 DOI: 10.3390/s20010309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 12/29/2019] [Accepted: 01/03/2020] [Indexed: 11/16/2022]
Abstract
Assessing emotional state is an emerging application field boosting research activities on the topic of analysis of non-invasive biosignals to find effective markers to accurately determine the emotional state in real-time. Nowadays using wearable sensors, electrocardiogram and thoracic impedance measurements can be recorded, facilitating analyzing cardiac and respiratory functions directly and autonomic nervous system function indirectly. Such analysis allows distinguishing between different emotional states: neutral, sadness, and disgust. This work was specifically focused on the proposal of a k-fold approach for selecting features while training the classifier that reduces the loss of generalization. The performance of the proposed algorithm used as the selection criterion was compared to the commonly used standard error function. The proposed k-fold approach outperforms the conventional method with 4% hit success rate improvement, reaching an accuracy near to 78%. Moreover, the proposed selection criterion method allows the classifier to produce the best performance using a lower number of features at lower computational cost. A reduced number of features reduces the risk of overfitting while a lower computational cost contributes to implementing real-time systems using wearable electronics.
Collapse
Affiliation(s)
- Inma Mohino-Herranz
- Department of Signal Theory and Communications, University of Alcalá, Alcalá de Henares, 28805 Madrid, Spain; (R.G.-P.); (J.G.-G.); (M.R.-Z.)
- Correspondence:
| | - Roberto Gil-Pita
- Department of Signal Theory and Communications, University of Alcalá, Alcalá de Henares, 28805 Madrid, Spain; (R.G.-P.); (J.G.-G.); (M.R.-Z.)
| | - Joaquín García-Gómez
- Department of Signal Theory and Communications, University of Alcalá, Alcalá de Henares, 28805 Madrid, Spain; (R.G.-P.); (J.G.-G.); (M.R.-Z.)
| | - Manuel Rosa-Zurera
- Department of Signal Theory and Communications, University of Alcalá, Alcalá de Henares, 28805 Madrid, Spain; (R.G.-P.); (J.G.-G.); (M.R.-Z.)
| | - Fernando Seoane
- Institute for Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Solna Stockholm, Sweden;
- Department of Medical Care Technology, Karolinska University Hospital, 14157 Huddinge, Sweden
- Textile Materials Technology, Department of Textile Technology, Faculty of Textiles, Engineering and Businees Swedish School of Textiles, University of Boras, 50190 Boras, Sweden
| |
Collapse
|
6
|
A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. EVOLUTIONARY INTELLIGENCE 2019. [DOI: 10.1007/s12065-019-00306-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
7
|
Kim Y, Lee K. A novel approach to predict ingress/egress discomfort based on human motion and biomechanical analysis. APPLIED ERGONOMICS 2019; 75:263-271. [PMID: 30509535 DOI: 10.1016/j.apergo.2018.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 10/01/2018] [Accepted: 11/11/2018] [Indexed: 06/09/2023]
Abstract
This study proposes an ingress/egress discomfort prediction algorithm using an in-depth biomechanical method and motion capture database. The ingress/egress motion of the subject was captured using an optical motion capture system and physically adjustable vehicle mock-up. The subjective discomfort evaluation data were also recorded at the same time. The inverse kinematics and inverse dynamics were performed to analyze captured ingress/egress motion. These procedure provide motion and joint torque information on each subject. Based on the analysis results, this study proposes the following novel features: accumulated movement of joint and sum of rectified joint torque. This study conducted a feature selection procedure to identify a relevant feature subset. Recursive feature selection and optimal feature selection methods found the most relevant feature subset with collected subjective responses. Finally, we constructed the prediction model using support vector machine. The prediction model was evaluated through prediction accuracy and statistical analysis. For comparison with the previous study, this study implemented two representative models and compare the result with those of the previous studies using the identical dataset. The effectiveness of proposed algorithm was demonstrated in comparison with previous studies.
Collapse
Affiliation(s)
- Younguk Kim
- School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Republic of Korea
| | - Kunwoo Lee
- School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
8
|
Liu Y, Liu B, Shan L, Wang X. Modelling context with neural networks for recommending idioms in essay writing. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Sylvester EVA, Bentzen P, Bradbury IR, Clément M, Pearce J, Horne J, Beiko RG. Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl 2017; 11:153-165. [PMID: 29387152 PMCID: PMC5775496 DOI: 10.1111/eva.12524] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 07/11/2017] [Indexed: 01/10/2023] Open
Abstract
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine‐learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine‐scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self‐assignment accuracy of at least 90% using each method to create panels of 50–700 markers Panels of SNPs identified using random forest‐based methods performed up to 7.8 and 11.2 percentage points better than FST‐selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self‐assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using FST‐selected panels. Our results demonstrate a role for machine‐learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Collapse
Affiliation(s)
| | - Paul Bentzen
- Marine Gene Probe Laboratory Department of Biology Dalhousie University Halifax NS Canada
| | | | - Marie Clément
- Centre for Fisheries Ecosystems Research, Fisheries and Marine Institute Memorial University of Newfoundland St. John's NL Canada.,Labrador Institute Memorial University of Newfoundland Happy Valley-Goose Bay NL Canada
| | - Jon Pearce
- Northern SE Regional Aquaculture Association Hidden Falls Hatchery Sitka AK USA
| | - John Horne
- Marine Gene Probe Laboratory Department of Biology Dalhousie University Halifax NS Canada
| | - Robert G Beiko
- Faculty of Computer Science Dalhousie University Halifax NS Canada
| |
Collapse
|
10
|
|
11
|
|
12
|
FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm. PLoS One 2016; 11:e0150669. [PMID: 27014873 PMCID: PMC4807955 DOI: 10.1371/journal.pone.0150669] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 02/16/2016] [Indexed: 12/24/2022] Open
Abstract
Motivation Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Method In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. Results We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.
Collapse
|
13
|
Mersha TB. Mapping asthma-associated variants in admixed populations. Front Genet 2015; 6:292. [PMID: 26483834 PMCID: PMC4586512 DOI: 10.3389/fgene.2015.00292] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 09/03/2015] [Indexed: 12/19/2022] Open
Abstract
Admixed populations arise when two or more previously isolated populations interbreed. Mapping asthma susceptibility loci in an admixed population using admixture mapping (AM) involves screening the genome of individuals of mixed ancestry for chromosomal regions that have a higher frequency of alleles from a parental population with higher asthma risk as compared with parental population with lower asthma risk. AM takes advantage of the admixture created in populations of mixed ancestry to identify genomic regions where an association exists between genetic ancestry and asthma (in contrast to between the genotype of the marker and asthma). The theory behind AM is that chromosomal segments of affected individuals contain a significantly higher-than-average proportion of alleles from the high-risk parental population and thus are more likely to harbor disease-associated loci. Criteria to evaluate the applicability of AM as a gene mapping approach include: (1) the prevalence of the disease differences in ancestral populations from which the admixed population was formed; (2) a measurable difference in disease-causing alleles between the parental populations; (3) reduced linkage disequilibrium (LD) between unlinked loci across chromosomes and strong LD between neighboring loci; (4) a set of markers with noticeable allele-frequency differences between parental populations that contributes to the admixed population (single nucleotide polymorphisms (SNPs) are the markers of choice because they are abundant, stable, relatively cheap to genotype, and informative with regard to the LD structure of chromosomal segments); and (5) there is an understanding of the extent of segmental chromosomal admixtures and their interactions with environmental factors. Although genome-wide association studies have contributed greatly to our understanding of the genetic components of asthma, the large and increasing degree of admixture in populations across the world create many challenges for further efforts to map disease-causing genes. This review, summarizes the historical context of admixed populations and AM, and considers current opportunities to use AM to map asthma genes. In addition, we provide an overview of the potential limitations and future directions of AM in biomedical research, including joint admixture and association mapping for asthma and asthma-related disorders.
Collapse
Affiliation(s)
- Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati Cincinnati, OH, USA
| |
Collapse
|
14
|
Aflakparast M, Salimi H, Gerami A, Dubé MP, Visweswaran S, Masoudi-Nejad A. Cuckoo search epistasis: a new method for exploring significant genetic interactions. Heredity (Edinb) 2014; 112:666-74. [PMID: 24549111 DOI: 10.1038/hdy.2014.4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 12/09/2013] [Accepted: 12/18/2013] [Indexed: 11/09/2022] Open
Abstract
The advent of high-throughput sequencing technology has resulted in the ability to measure millions of single-nucleotide polymorphisms (SNPs) from thousands of individuals. Although these high-dimensional data have paved the way for better understanding of the genetic architecture of common diseases, they have also given rise to challenges in developing computational methods for learning epistatic relationships among genetic markers. We propose a new method, named cuckoo search epistasis (CSE) for identifying significant epistatic interactions in population-based association studies with a case-control design. This method combines a computationally efficient Bayesian scoring function with an evolutionary-based heuristic search algorithm, and can be efficiently applied to high-dimensional genome-wide SNP data. The experimental results from synthetic data sets show that CSE outperforms existing methods including multifactorial dimensionality reduction and Bayesian epistasis association mapping. In addition, on a real genome-wide data set related to Alzheimer's disease, CSE identified SNPs that are consistent with previously reported results, and show the utility of CSE for application to genome-wide data.
Collapse
Affiliation(s)
- M Aflakparast
- 1] Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran [2] Department of Mathematics, Faculty of Sciences, VU University, Amsterdam, The Netherlands
| | - H Salimi
- Department of Computer Science, University of Tehran, Tehran, Iran
| | - A Gerami
- Department of Statistics and Mathematics, Islamic Azad University, Qazvin Branch, Qazvin, Iran
| | - M-P Dubé
- Department of Medicine, Faculty of Medicine, University of Montreal, Montreal, Quebec, Canada
| | - S Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - A Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
15
|
Lange K, Papp JC, Sinsheimer JS, Sobel EM. Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2014; 1:279-300. [PMID: 24955378 PMCID: PMC4062304 DOI: 10.1146/annurev-statistics-022513-115638] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
Collapse
Affiliation(s)
- Kenneth Lange
- Depts of Biomathematics, Human Genetics, and Statistics, UCLA
| | | | - Janet S. Sinsheimer
- Depts of Biomathematics, Human Genetics, Statistics, and Biostatistics, UCLA
| | | |
Collapse
|
16
|
Phaik-Ling Ong, Yun-Huoy Choo, Emran NA. Classification of SNPs for obesity analysis using FARNeM modelling. 2013 13TH INTERNATIONAL CONFERENCE ON INTELLIENT SYSTEMS DESIGN AND APPLICATIONS 2013. [DOI: 10.1109/isda.2013.6920746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
17
|
Newby D, Freitas AA, Ghafourian T. Pre-processing Feature Selection for Improved C&RT Models for Oral Absorption. J Chem Inf Model 2013; 53:2730-42. [DOI: 10.1021/ci400378j] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Danielle Newby
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, U.K
| | - Alex. A. Freitas
- School of Computing, University of Kent, Canterbury, Kent CT2 7NF, U.K
| | - Taravat Ghafourian
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, U.K
- Drug Applied Research Centre and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, East Azerbaijan 51664, Iran
| |
Collapse
|
18
|
Khan MW, Alam M. A survey of application: genomics and genetic programming, a new frontier. Genomics 2012; 100:65-71. [PMID: 22683715 DOI: 10.1016/j.ygeno.2012.05.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2011] [Revised: 05/22/2012] [Accepted: 05/29/2012] [Indexed: 11/15/2022]
Abstract
The aim of this paper is to provide an introduction to the rapidly developing field of genetic programming (GP). Particular emphasis is placed on the application of GP to genomics. First, the basic methodology of GP is introduced. This is followed by a review of applications in the areas of gene network inference, gene expression data analysis, SNP analysis, epistasis analysis and gene annotation. Finally this paper concluded by suggesting potential avenues of possible future research on genetic programming, opportunities to extend the technique, and areas for possible practical applications.
Collapse
Affiliation(s)
- Mohammad Wahab Khan
- Department of Computer Science, Jamia Millia Islamia, Maulana Mohammad Ali Jauhar Marg, New Delhi 110025, India.
| | | |
Collapse
|
19
|
Yang CH, Chuang LY, Cheng YH, Lin YD, Wang CL, Wen CH, Chang HW. Single nucleotide polymorphism barcoding to evaluate oral cancer risk using odds ratio-based genetic algorithms. Kaohsiung J Med Sci 2012; 28:362-8. [PMID: 22726897 DOI: 10.1016/j.kjms.2012.02.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 08/11/2011] [Indexed: 12/21/2022] Open
Abstract
Cancers often involve the synergistic effects of gene-gene interactions, but identifying these interactions remains challenging. Here, we present an odds ratio-based genetic algorithm (OR-GA) that is able to solve the problems associated with the simultaneous analysis of multiple independent single nucleotide polymorphisms (SNPs) that are associated with oral cancer. The SNP interactions between four SNPs-namely rs1799782, rs2040639, rs861539, rs2075685, and belonging to four genes (XRCC1, XRCC2, XRCC3, and XRCC4)-were tested in this study, respectively. The GA decomposes the SNPs sets into different SNP combinations with their corresponding genotypes (called SNP barcodes). The GA can effectively identify a specific SNP barcode that has an optimized fitness value and uses this to calculate the difference between the case and control groups. The SNP barcodes with a low fitness value are naturally removed from the population. Using two to four SNPs, the best SNP barcodes with maximum differences in occurrence between the case and control groups were generated by GA algorithm. Subsequently, the OR provides a quantitative measure of the multiple SNP synergies between the oral cancer and control groups by calculating the risk related to the best SNP barcodes and others. When these were compared to their corresponding non-SNP barcodes, the estimated ORs for oral cancer were found to be great than 1 [approx. 1.72-2.23; confidence intervals (CIs): 0.94-5.30, p < 0.03-0.07] for various specific SNP barcodes with two to four SNPs. In conclusion, the proposed OR-GA method successfully generates SNP barcodes, which allow oral cancer risk to be evaluated and in the process the OR-GA method identifies possible SNP-SNP interactions.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
| | | | | | | | | | | | | |
Collapse
|
20
|
Amirisetty S, Hershey GKK, Baye TM. AncestrySNPminer: a bioinformatics tool to retrieve and develop ancestry informative SNP panels. Genomics 2012; 100:57-63. [PMID: 22584067 DOI: 10.1016/j.ygeno.2012.05.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Revised: 03/21/2012] [Accepted: 05/06/2012] [Indexed: 11/19/2022]
Abstract
A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple "scripting at the click of a button" functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php.
Collapse
Affiliation(s)
- Sushil Amirisetty
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH 45229, USA.
| | | | | |
Collapse
|
21
|
Ding L, Wiener H, Abebe T, Altaye M, Go RCP, Kercsmar C, Grabowski G, Martin LJ, Khurana Hershey GK, Chakorborty R, Baye TM. Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics 2011; 12:622. [PMID: 22185208 PMCID: PMC3276602 DOI: 10.1186/1471-2164-12-622] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 12/20/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Admixture mapping is a powerful gene mapping approach for an admixed population formed from ancestral populations with different allele frequencies. The power of this method relies on the ability of ancestry informative markers (AIMs) to infer ancestry along the chromosomes of admixed individuals. In this study, more than one million SNPs from HapMap databases and simulated data have been interrogated in admixed populations using various measures of ancestry informativeness: Fisher Information Content (FIC), Shannon Information Content (SIC), F statistics (FST), Informativeness for Assignment Measure (In), and the Absolute Allele Frequency Differences (delta, δ). The objectives are to compare these measures of informativeness to select SNP markers for ancestry inference, and to determine the accuracy of AIM panels selected by each measure in estimating the contributions of the ancestors to the admixed population. RESULTS FST and In had the highest Spearman correlation and the best agreement as measured by Kappa statistics based on deciles. Although the different measures of marker informativeness performed comparably well, analyses based on the top 1 to 10% ranked informative markers of simulated data showed that In was better in estimating ancestry for an admixed population. CONCLUSIONS Although millions of SNPs have been identified, only a small subset needs to be genotyped in order to accurately predict ancestry with a minimal error rate in a cost-effective manner. In this article, we compared various methods for selecting ancestry informative SNPs using simulations as well as SNP genotype data from samples of admixed populations and showed that the In measure estimates ancestry proportion (in an admixed population) with lower bias and mean square error.
Collapse
Affiliation(s)
- Lili Ding
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Howard Wiener
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tilahun Abebe
- Department of Biology, University of Northern Iowa, Cedar Falls, IA, USA
| | - Mekbib Altaye
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Rodney CP Go
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Carolyn Kercsmar
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Greg Grabowski
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Lisa J Martin
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Gurjit K Khurana Hershey
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Ranajit Chakorborty
- Center for Computational Genomics, Institute of Applied Genetics, Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Tesfaye M Baye
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
22
|
Jayalal M, Kumar LS, Jehadeesan R, Rajeswari S, Satya Murty S, Balasubramaniyan V, Chetal S. Steam condenser optimization using Real-parameter Genetic Algorithm for Prototype Fast Breeder Reactor. NUCLEAR ENGINEERING AND DESIGN 2011. [DOI: 10.1016/j.nucengdes.2011.08.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Mao KZ, Tang W. Recursive Mahalanobis separability measure for gene subset selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:266-272. [PMID: 20479500 DOI: 10.1109/tcbb.2010.43] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Mahalanobis class separability measure provides an effective evaluation of the discriminative power of a feature subset, and is widely used in feature selection. However, this measure is computationally intensive or even prohibitive when it is applied to gene expression data. In this study, a recursive approach to Mahalanobis measure evaluation is proposed, with the goal of reducing computational overhead. Instead of evaluating Mahalanobis measure directly in high-dimensional space, the recursive approach evaluates the measure through successive evaluations in 2D space. Because of its recursive nature, this approach is extremely efficient when it is combined with a forward search procedure. In addition, it is noted that gene subsets selected by Mahalanobis measure tend to overfit training data and generalize unsatisfactorily on unseen test data, due to small sample size in gene expression problems. To alleviate the overfitting problem, a regularized recursive Mahalanobis measure is proposed in this study, and guidelines on determination of regularization parameters are provided. Experimental studies on five gene expression problems show that the regularized recursive Mahalanobis measure substantially outperforms the nonregularized Mahalanobis measures and the benchmark recursive feature elimination (RFE) algorithm in all five problems.
Collapse
Affiliation(s)
- K Z Mao
- School of Electrical and Electronic Engineering, Block S2.1, Nanyang Technological University, Singapore 639798.
| | | |
Collapse
|
24
|
Baye TM, Wilke RA. Mapping genes that predict treatment outcome in admixed populations. THE PHARMACOGENOMICS JOURNAL 2010; 10:465-77. [PMID: 20921971 PMCID: PMC2991422 DOI: 10.1038/tpj.2010.71] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2010] [Revised: 07/07/2010] [Accepted: 08/05/2010] [Indexed: 01/19/2023]
Abstract
There is great interest in characterizing the genetic architecture underlying drug response. For many drugs, gene-based dosing models explain a considerable amount of the overall variation in treatment outcome. As such, prescription drug labels are increasingly being modified to contain pharmacogenetic information. Genetic data must, however, be interpreted within the context of relevant clinical covariates. Even the most predictive models improve with the addition of data related to biogeographical ancestry. The current review explores analytical strategies that leverage population structure to more fully characterize genetic determinants of outcome in large clinical practice-based cohorts. The success of this approach will depend upon several key factors: (1) the availability of outcome data from groups of admixed individuals (that is, populations recombined over multiple generations), (2) a measurable difference in treatment outcome (that is, efficacy and toxicity end points), and (3) a measurable difference in allele frequency between the ancestral populations.
Collapse
Affiliation(s)
- T M Baye
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH 45229-3039, USA.
| | | |
Collapse
|
25
|
Brunel H, Gallardo-Chacón JJ, Buil A, Vallverdú M, Soria JM, Caminal P, Perera A. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. ACTA ACUST UNITED AC 2010; 26:1811-8. [PMID: 20562420 DOI: 10.1093/bioinformatics/btq273] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Finding association between genetic variants and phenotypes related to disease has become an important vehicle for the study of complex disorders. In this context, multi-loci genetic association might unravel additional information when compared with single loci search. The main goal of this work is to propose a non-linear methodology based on information theory for finding combinatorial association between multi-SNPs and a given phenotype. RESULTS The proposed methodology, called MISS (mutual information statistical significance), has been integrated jointly with a feature selection algorithm and has been tested on a synthetic dataset with a controlled phenotype and in the particular case of the F7 gene. The MISS methodology has been contrasted with a multiple linear regression (MLR) method used for genetic association in both, a population-based study and a sib-pairs analysis and with the maximum entropy conditional probability modelling (MECPM) method, which searches for predictive multi-locus interactions. Several sets of SNPs within the F7 gene region have been found to show a significant correlation with the FVII levels in blood. The proposed multi-site approach unveils combinations of SNPs that explain more significant information of the phenotype than their individual polymorphisms. MISS is able to find more correlations between SNPs and the phenotype than MLR and MECPM. Most of the marked SNPs appear in the literature as functional variants with real effect on the protein FVII levels in blood. AVAILABILITY The code is available at http://sisbio.recerca.upc.edu/R/MISS_0.2.tar.gz
Collapse
Affiliation(s)
- Helena Brunel
- Institut de Bioenginyeria de Catalunya, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, Pau Gargallo 5, 08028 Barcelona, Spain.
| | | | | | | | | | | | | |
Collapse
|
26
|
Baca-Garcia E, Vaquero-Lorenzo C, Perez-Rodriguez MM, Gratacòs M, Bayés M, Santiago-Mozos R, Leiva-Murillo JM, de Prado-Cumplido M, Artes-Rodriguez A, Ceverino A, Diaz-Sastre C, Fernandez-Navarro P, Costas J, Fernandez-Piqueras J, Diaz-Hernandez M, de Leon J, Baca-Baldomero E, Saiz-Ruiz J, Mann JJ, Parsey RV, Carracedo A, Estivill X, Oquendo MA. Nucleotide variation in central nervous system genes among male suicide attempters. Am J Med Genet B Neuropsychiatr Genet 2010; 153B:208-13. [PMID: 19455598 DOI: 10.1002/ajmg.b.30975] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Despite marked morbidity and mortality associated with suicidal behavior, accurate identification of individuals at risk remains elusive. The goal of this study is to identify a model based on single nucleotide polymorphisms (SNPs) that discriminates between suicide attempters and non-attempters using data mining strategies. We examined functional SNPs (n = 840) of 312 brain function and development genes using data mining techniques. Two hundred seventy-seven male psychiatric patients aged 18 years or older were recruited at a University hospital psychiatric emergency room or psychiatric short stay unit. The main outcome measure was history of suicide attempts. Three SNPs of three genes (rs10944288, HTR1E; hCV8953491, GABRP; and rs707216, ACTN2) correctly classified 67% of male suicide attempters and non-attempters (0.50 sensitivity, 0.82 specificity, positive likelihood ratio = 2.80, negative likelihood ratio = 1.64). The OR for the combined three SNPs was 4.60 (95% CI: 1.31-16.10). The model's accuracy suggests that in the future similar methodologies may generate simple genetic tests with diagnostic utility in identification of suicide attempters. This strategy may uncover new pathophysiological pathways regarding the neurobiology of suicidal acts.
Collapse
Affiliation(s)
- Enrique Baca-Garcia
- Department of Psychiatry at Fundacion Jimenez Diaz Hospital, Autonoma University, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Taylor CM, Agah A. Data Mining and Hypothesis Refinement using a Multi-Tiered Genetic Algorithm. JOURNAL OF INTELLIGENT SYSTEMS 2010. [DOI: 10.1515/jisys.2010.19.3.191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
28
|
Kelemen A, Vasilakos AV, Liang Y. Computational intelligence in bioinformatics: SNP/haplotype data in genetic association study for common diseases. ACTA ACUST UNITED AC 2009; 13:841-7. [PMID: 19556205 DOI: 10.1109/titb.2009.2024144] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Comprehensive evaluation of common genetic variations through association of single-nucleotide polymorphism (SNP) structure with common complex disease in the genome-wide scale is currently a hot area in human genome research due to the recent development of the Human Genome Project and HapMap Project. Computational science, which includes computational intelligence (CI), has recently become the third method of scientific enquiry besides theory and experimentation. There have been fast growing interests in developing and applying CI in disease mapping using SNP and haplotype data. Some of the recent studies have demonstrated the promise and importance of CI for common complex diseases in genomic association study using SNP/haplotype data, especially for tackling challenges, such as gene-gene and gene-environment interactions, and the notorious "curse of dimensionality" problem. This review provides coverage of recent developments of CI approaches for complex diseases in genetic association study with SNP/haplotype data.
Collapse
Affiliation(s)
- Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201, USA.
| | | | | |
Collapse
|
29
|
Carvalho PC, Hewel J, Barbosa VC, Yates JR. Identifying differences in protein expression levels by spectral counting and feature selection. GENETICS AND MOLECULAR RESEARCH 2008; 7:342-56. [PMID: 18551400 DOI: 10.4238/vol7-2gmr426] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Spectral counting is a strategy to quantify relative protein concentrations in pre-digested protein mixtures analyzed by liquid chromatography online with tandem mass spectrometry. In the present study, we used combinations of normalization and statistical (feature selection) methods on spectral counting data to verify whether we could pinpoint which and how many proteins were differentially expressed when comparing complex protein mixtures. These combinations were evaluated on real, but controlled, experiments (yeast lysates were spiked with protein markers at different concentrations to simulate differences), which were therefore verifiable. The following normalization methods were applied: total signal, Z-normalization, hybrid normalization, and log preprocessing. The feature selection methods were: the Golub index, the Student t-test, a strategy based on the weighting used in a forward-support vector machine (SVM-F) model, and SVM recursive feature elimination. The results showed that Z-normalization combined with SVM-F correctly identified which and how many protein markers were added to the yeast lysates for all different concentrations. The software we used is available at http://pcarvalho.com/patternlab.
Collapse
Affiliation(s)
- P C Carvalho
- Programa de Engenharia de Sistemas e Computação, COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brasil.
| | | | | | | |
Collapse
|
30
|
Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications. STUDIES IN COMPUTATIONAL INTELLIGENCE 2008. [DOI: 10.1007/978-3-540-78293-3_22] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
31
|
Li J, Tang X. A new classification model with simple decision rule for discovering optimal feature gene pairs. Comput Biol Med 2007; 37:1637-46. [PMID: 17482157 DOI: 10.1016/j.compbiomed.2007.03.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2006] [Revised: 03/14/2007] [Accepted: 03/20/2007] [Indexed: 11/28/2022]
Abstract
Classifiers have been widely used to select an optimal subset of feature genes from microarray data for accurate classification of cancer samples and cancer-related studies. However, the classification rules derived from most classifiers are complex and difficult to understand in biological significance. How to solve this problem is a new challenge. In this paper, a new classification model based on gene pair is proposed to address the problem. The experimental results on several microarray data demonstrate that the proposed classification model performs well in finding a large number of excellent feature gene pairs. A 100% LOOCV classification accuracy can be achieved using a single classification model based on optimal feature gene pair or combining multiple top-ranked classification models. Using the proposed method, we successfully identified important cancer-related genes that had been validated in previous biological studies while they were not discovered by the other methods.
Collapse
Affiliation(s)
- Jie Li
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | | |
Collapse
|
32
|
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23:2507-17. [PMID: 17720704 DOI: 10.1093/bioinformatics/btm344] [Citation(s) in RCA: 1965] [Impact Index Per Article: 115.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Collapse
Affiliation(s)
- Yvan Saeys
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium.
| | | | | |
Collapse
|
33
|
Huang D, Chow T. Effective gene selection method with small sample sets using gradient-based and point injection techniques. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:467-475. [PMID: 17666766 DOI: 10.1109/tcbb.2007.1021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic test. This paper focuses on effective identification of informative genes. We analyze gene selection models from the perspective of optimization theory. As a result, a new strategy is designed to modify conventional search engines. Also, as overfitting is likely to occur in microarray data because of their small sample set, a point injection technique is developed to address the problem of overfitting. The proposed strategies have been evaluated on three kinds of cancer diagnosis. Our results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.
Collapse
|
34
|
Armand S, Watelain E, Mercier M, Lensel G, Lepoutre FX. Identification and classification of toe-walkers based on ankle kinematics, using a data-mining method. Gait Posture 2006; 23:240-8. [PMID: 16399521 DOI: 10.1016/j.gaitpost.2005.02.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2004] [Revised: 02/18/2005] [Accepted: 02/25/2005] [Indexed: 02/02/2023]
Abstract
A database of 1,736 patients and 2,511 gait analyses was reviewed to identify for trials where the first rocker was absent. A fuzzy c-means algorithm was used to identify sagittal ankle kinematic patterns and three groups were identified. The first showed a progressive dorsiflexion during the stance phase, while the second had a short-lived dorsiflexion, followed by a progressive plantarflexion. The third group exhibited a double bump pattern, moving successively from a short-lived dorsiflexion to a short-lived plantarflexion and then returning to a further short-lived dorsiflexion before ending with plantarflexion until toe-off. The three patterns were linked to different neurological conditions. Myopathy, neuropathy and arthogryposis essentially revealed group 1 patterns, whereas idiopathic toe-walkers mainly displayed group 2 patterns. Cerebral palsy patients, however, were relatively homogeneously distributed amongst the three groups. Able-bodied subjects walking on their toes showed a high proportion of unclassifiable ankle patterns, due to a variable gait whilst toe walking. Despite the variety of neurological conditions included in this meta-analysis repeatable biomechanical patterns appeared that could influence therapeutic management.
Collapse
Affiliation(s)
- Stéphane Armand
- Laboratoire d'Automatique, de Mécanique et d'Informatique Industrielles et Humaines, Université de Valenciennes et du Hainaut-Cambrésis, LAMIH, UMR CNRS 8530, France.
| | | | | | | | | |
Collapse
|