751
|
Abstract
For decades, microbiologists have considered uncertainties as an undesired side effect of experimental protocols. As a consequence, standard microbial system modeling strives to hide uncertainties for the sake of deterministic understanding. For decades, microbiologists have considered uncertainties as an undesired side effect of experimental protocols. As a consequence, standard microbial system modeling strives to hide uncertainties for the sake of deterministic understanding. However, recent studies have highlighted greater experimental variability than expected and emphasized uncertainties not as a weakness but as a necessary feature of complex microbial systems. We therefore advocate that biological uncertainties need to be considered foundational facets that must be incorporated in models. Not only will understanding these uncertainties improve our understanding and identification of microbial traits, it will also provide fundamental insights on microbial systems as a whole. Taking into account uncertainties within microbial models calls for new validation techniques. Formal verification already overcomes this shortcoming by proposing modeling frameworks and validation techniques dedicated to probabilistic models. However, further work remains to extract the full potential of such techniques in the context of microbial models. Herein, we demonstrate how statistical model checking can enhance the development of microbial models by building confidence in the estimation of critical parameters and through improved sensitivity analyses.
Collapse
|
752
|
Abstract
In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.
Collapse
|
753
|
Azuaje F. Computational models for predicting drug responses in cancer research. Brief Bioinform 2017; 18:820-829. [PMID: 27444372 PMCID: PMC5862310 DOI: 10.1093/bib/bbw065] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Indexed: 02/06/2023] Open
Abstract
The computational prediction of drug responses based on the analysis of multiple types of genome-wide molecular data is vital for accomplishing the promise of precision medicine in oncology. This will benefit cancer patients by matching their tumor characteristics to the most effective therapy available. As larger and more diverse layers of patient-related data become available, further demands for new bioinformatics approaches and expertise will arise. This article reviews key strategies, resources and techniques for the prediction of drug sensitivity in cell lines and patient-derived samples. It discusses major advances and challenges associated with the different model development steps. This review highlights major trends in this area, and will assist researchers in the assessment of recent progress and in the selection of approaches to emerging applications in oncology.
Collapse
Affiliation(s)
- Francisco Azuaje
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg
- Corresponding author: Francisco Azuaje, NorLux Neuro-Oncology Laboratory, Department of Oncology, Luxembourg Institute of Health (LIH), Luxembourg L-1526, Luxembourg. Tel.: +352-26970875; Fax: +352-26970396; E-mail:
| |
Collapse
|
754
|
Banovich NE, Li YI, Raj A, Ward MC, Greenside P, Calderon D, Tung PY, Burnett JE, Myrthil M, Thomas SM, Burrows CK, Romero IG, Pavlovic BJ, Kundaje A, Pritchard JK, Gilad Y. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res 2017; 28:122-131. [PMID: 29208628 PMCID: PMC5749177 DOI: 10.1101/gr.224436.117] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 11/20/2017] [Indexed: 12/17/2022]
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.
Collapse
Affiliation(s)
- Nicholas E Banovich
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Yang I Li
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Anil Raj
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Michelle C Ward
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - Peyton Greenside
- Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
| | - Diego Calderon
- Department of Biomedical Informatics, Stanford University, Stanford, California 94305, USA
| | - Po Yuan Tung
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - Jonathan E Burnett
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Marsha Myrthil
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Samantha M Thomas
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Courtney K Burrows
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Irene Gallego Romero
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Bryan J Pavlovic
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, California 94305, USA.,Department of Biology, Stanford University, Stanford, California 94305, USA.,Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, USA
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
755
|
Abstract
Most biological mechanisms involve more than one type of biomolecule, and hence operate not solely at the level of either genome, transcriptome, proteome, metabolome or ionome. Datasets resulting from single-omic analysis are rapidly increasing in throughput and quality, rendering multi-omic studies feasible. These should offer a comprehensive, structured and interactive overview of a biological mechanism. However, combining single-omic datasets in a meaningful manner has so far proved challenging, and the discovery of new biological information lags behind expectation. One reason is that experiments conducted in different laboratories can typically not to be combined without restriction. Second, the interpretation of multi-omic datasets represents a significant challenge by nature, as the biological datasets are heterogeneous not only for technical, but also for biological, chemical, and physical reasons. Here, multi-layer network theory and methods of artificial intelligence might contribute to solve these problems. For the efficient application of machine learning however, biological datasets need to become more systematic, more precise - and much larger. We conclude our review with basic guidelines for the successful set-up of a multi-omic experiment.
Collapse
|
756
|
Deep biomarkers of human aging: Application of deep neural networks to biomarker development. Aging (Albany NY) 2017; 8:1021-33. [PMID: 27191382 PMCID: PMC4931851 DOI: 10.18632/aging.100968] [Citation(s) in RCA: 187] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 05/09/2016] [Indexed: 01/05/2023]
Abstract
One of the major impediments in human aging research is the absence of a comprehensive and actionable set of biomarkers that may be targeted and measured to track the effectiveness of therapeutic interventions. In this study, we designed a modular ensemble of 21 deep neural networks (DNNs) of varying depth, structure and optimization to predict human chronological age using a basic blood test. To train the DNNs, we used over 60,000 samples from common blood biochemistry and cell count tests from routine health exams performed by a single laboratory and linked to chronological age and sex. The best performing DNN in the ensemble demonstrated 81.5 % epsilon-accuracy r = 0.90 with R2 = 0.80 and MAE = 6.07 years in predicting chronological age within a 10 year frame, while the entire ensemble achieved 83.5% epsilon-accuracy r = 0.91 with R2 = 0.82 and MAE = 5.55 years. The ensemble also identified the 5 most important markers for predicting human chronological age: albumin, glucose, alkaline phosphatase, urea and erythrocytes. To allow for public testing and evaluate real-life performance of the predictor, we developed an online system available at http://www.aging.ai. The ensemble approach may facilitate integration of multi-modal data linked to chronological age and sex that may lead to simple, minimally invasive, and affordable methods of tracking integrated biomarkers of aging in humans and performing cross-species feature importance analysis.
Collapse
|
757
|
Mamoshina P, Ojomoko L, Yanovich Y, Ostrovski A, Botezatu A, Prikhodko P, Izumchenko E, Aliper A, Romantsov K, Zhebrak A, Ogu IO, Zhavoronkov A. Converging blockchain and next-generation artificial intelligence technologies to decentralize and accelerate biomedical research and healthcare. Oncotarget 2017; 9:5665-5690. [PMID: 29464026 PMCID: PMC5814166 DOI: 10.18632/oncotarget.22345] [Citation(s) in RCA: 238] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 11/02/2017] [Indexed: 12/19/2022] Open
Abstract
The increased availability of data and recent advancements in artificial intelligence present the unprecedented opportunities in healthcare and major challenges for the patients, developers, providers and regulators. The novel deep learning and transfer learning techniques are turning any data about the person into medical data transforming simple facial pictures and videos into powerful sources of data for predictive analytics. Presently, the patients do not have control over the access privileges to their medical records and remain unaware of the true value of the data they have. In this paper, we provide an overview of the next-generation artificial intelligence and blockchain technologies and present innovative solutions that may be used to accelerate the biomedical research and enable patients with new tools to control and profit from their personal data as well with the incentives to undergo constant health monitoring. We introduce new concepts to appraise and evaluate personal records, including the combination-, time- and relationship-value of the data. We also present a roadmap for a blockchain-enabled decentralized personal health data ecosystem to enable novel approaches for drug discovery, biomarker development, and preventative healthcare. A secure and transparent distributed personal data marketplace utilizing blockchain and deep learning technologies may be able to resolve the challenges faced by the regulators and return the control over personal data including medical records back to the individuals.
Collapse
Affiliation(s)
- Polina Mamoshina
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA.,Department of Computer Science, University of Oxford, Oxford, United Kingdom
| | - Lucy Ojomoko
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA
| | | | | | | | | | - Eugene Izumchenko
- Department of Otolaryngology-Head & Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alexander Aliper
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA
| | - Konstantin Romantsov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA
| | - Alexander Zhebrak
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA
| | - Iraneus Obioma Ogu
- Africa Blockchain Artificial Intelligence for Healthcare Initiative, Insilico Medicine, Inc, Abuja, Nigeria
| | - Alex Zhavoronkov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, Baltimore, Maryland, USA.,The Biogerontology Research Foundation, London, United Kingdom
| |
Collapse
|
758
|
Sahakyan AB, Chambers VS, Marsico G, Santner T, Di Antonio M, Balasubramanian S. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci Rep 2017; 7:14535. [PMID: 29109402 PMCID: PMC5673958 DOI: 10.1038/s41598-017-14017-4] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 10/05/2017] [Indexed: 11/25/2022] Open
Abstract
We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.
Collapse
Affiliation(s)
- Aleksandr B Sahakyan
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Vicki S Chambers
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Giovanni Marsico
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Tobias Santner
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Marco Di Antonio
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Shankar Balasubramanian
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
759
|
Abstract
PURPOSE OF REVIEW The pathogenesis of genetically complex granulomatous diseases, such as sarcoidosis and latent tuberculosis, remains largely unknown. With the recent advent of more powerful research tools, such as genome-wide expression platforms, comes the challenge of making sense of the enormous data sets so generated. This manuscript will provide demonstrations of how in-silico (computer) analysis of large research data sets can lead to novel discoveries in the field of granulomatous lung disease. RECENT FINDINGS The application of in-silico research tools has led to novel discoveries in the fields of noninfectious (e.g., sarcoidosis) and infectious granulomatous diseases. Computer models have identified novel disease mechanisms and can be used to perform 'virtual' experiments rapidly and at low cost compared with conventional laboratory techniques. SUMMARY Granulomatous lung diseases are extremely complex, involving dynamic interactions between multiple genes, cells, and molecules. In-silico interpretation of large data sets generated from new research platforms that are capable of comprehensively characterizing and quantifying pools of biological molecules promises to rapidly accelerate the rate of scientific discovery in the field of granulomatous lung disorders.
Collapse
|
760
|
Hameed SS, Hassan R, Muhammad FF. Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm. PLoS One 2017; 12:e0187371. [PMID: 29095904 PMCID: PMC5667738 DOI: 10.1371/journal.pone.0187371] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2017] [Accepted: 10/18/2017] [Indexed: 11/30/2022] Open
Abstract
In this work, gene expression in autism spectrum disorder (ASD) is analyzed with the goal of selecting the most attributed genes and performing classification. The objective was achieved by utilizing a combination of various statistical filters and a wrapper-based geometric binary particle swarm optimization-support vector machine (GBPSO-SVM) algorithm. The utilization of different filters was accentuated by incorporating a mean and median ratio criterion to remove very similar genes. The results showed that the most discriminative genes that were identified in the first and last selection steps included the presence of a repetitive gene (CAPS2), which was assigned as the gene most highly related to ASD risk. The merged gene subset that was selected by the GBPSO-SVM algorithm was able to enhance the classification accuracy.
Collapse
Affiliation(s)
- Shilan S. Hameed
- Department of Computer Science, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
- Department of Software and Informatics Engineering, College of Engineering, Salahaddin University, Erbil, Kurdistan Region, Iraq
| | - Rohayanti Hassan
- Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
| | - Fahmi F. Muhammad
- Department of Physics, Faculty of Science & Health, Koya University, Koya, Kurdistan Region, Iraq
| |
Collapse
|
761
|
Douglas PS, Cerqueira MD, Berman DS, Chinnaiyan K, Cohen MS, Lundbye JB, Patel RAG, Sengupta PP, Soman P, Weissman NJ, Wong TC. The Future of Cardiac Imaging: Report of a Think Tank Convened by the American College of Cardiology. JACC Cardiovasc Imaging 2017; 9:1211-1223. [PMID: 27712724 DOI: 10.1016/j.jcmg.2016.02.027] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 02/08/2016] [Accepted: 02/16/2016] [Indexed: 11/16/2022]
Abstract
The American College of Cardiology's Executive Committee and Cardiovascular Imaging Section Leadership Council convened a discussion regarding the future of cardiac imaging among thought leaders in the field during a 2 day Think Tank. Participants were charged with thinking broadly about the future of imaging and developing a roadmap to address critical challenges. Key areas of discussion included: 1) how can cardiac imaging services thrive in our new world of value-based health care? 2) Who is the cardiac imager of the future and what is the role of the multimodality imager? 3) How can we nurture innovation and research in imaging? And 4) how can we maximize imaging information and optimize outcomes? This document describes the proceedings of this Think Tank.
Collapse
Affiliation(s)
- Pamela S Douglas
- Duke Clinical Research Institute and Duke University Medical Center, Durham, North Carolina.
| | - Manuel D Cerqueira
- Imaging and Heart and Vascular Institutes, Cleveland Clinic and Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio
| | - Daniel S Berman
- Departments of Imaging and Medicine, Cedars-Sinai Medical Center and the Cedars-Sinai Heart Institute, Los Angeles, California
| | - Kavitha Chinnaiyan
- Department of Cardiology, William Beaumont Hospital, Royal Oak, Michigan
| | - Meryl S Cohen
- Children's Hospital of Philadelphia, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania
| | - Justin B Lundbye
- Department of Cardiology, Hospital of Central Connecticut, New Britain, Connecticut
| | - Rajan A G Patel
- John Ochsner Heart and Vascular Institute, Ochsner Medical Center, New Orleans, Louisiana
| | - Partho P Sengupta
- Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Prem Soman
- Division of Cardiology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Neil J Weissman
- Department of Medicine, MedStar Health Research Institute and Georgetown University, Washington, DC
| | - Timothy C Wong
- Cardiovascular Magnetic Resonance Center, UPMC Heart and Vascular Institute, Pittsburgh, Pennsylvania
| | | |
Collapse
|
762
|
Jia J, Liang X, Chen S, Wang H, Li H, Fang M, Bai X, Wang Z, Wang M, Zhu S, Sun F, Gao C. Next-generation sequencing revealed divergence in deletions of the preS region in the HBV genome between different HBV-related liver diseases. J Gen Virol 2017; 98:2748-2758. [PMID: 29022863 DOI: 10.1099/jgv.0.000942] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In order to investigate if deletion patterns of the preS region can predict liver disease advancement, the preS region of the hepatitis B virus (HBV) genome in 45 chronic hepatitis B (CHB) and 94 HBV-related hepatocellular carcinoma (HCC) patients was sequenced by next-generation sequencing (NGS) and the percentages of nucleotide deletion in the preS region were analysed. Hierarchical clustering and heatmaps based on deletion percentages of preS revealed different deletion patterns between CHB and HCC patients. Intergenotype comparison also indicated divergence in preS deletions between HBV genotype B and C. No significant difference was found in preS deletion patterns between sera and matched adjacent non-tumour tissues. Based on hierarchical clustering, HCC patients were classed into two groups with different preS deletion patterns and different clinical features. Finally, the support vector machine (SVM) model was trained on preS nucleotide deletion percentages and used to predict HCC versus CHB patients. The prediction performance was assessed with fivefold cross-validation and independent cohort validation. The median area under the curve (AUC) was 0.729 after repeating SVM 500 times with fivefold cross-validations. After parameter optimization, the SVM model was used to predict an independent cohort with 51 CHB patients and 72 HCC patients and the AUC was 0.727. In conclusion, the use of the NGS method revealed a prominent divergence in preS deletion patterns between disease groups and virus genotypes, but not between different tissue types. Quantitative NGS data combined with a machine learning method could be a powerful approach for prediction of the status of different diseases.
Collapse
Affiliation(s)
- Jian'an Jia
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China.,Department of Laboratory Medicine, The 105th Hospital of PLA, Hefei 230031, PR China
| | - Xiaotao Liang
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, PR China.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, PR China
| | - Shipeng Chen
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China
| | - Hui Wang
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China.,Department of Clinical Laboratory, The First Affiliated Hospital of Chinese PLA's General Hospital, Beijing 100048, PR China
| | - Huiming Li
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China
| | - Meng Fang
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China
| | - Xin Bai
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, PR China
| | - Ziyi Wang
- Shanghai Institute of Technology, Shanghai 201418, PR China
| | - Mengmeng Wang
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China
| | - Shanfeng Zhu
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, PR China.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, PR China
| | - Fengzhu Sun
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, PR China.,Molecular and Computational Program Department of Biological Sciences, University of Southern California, LA 90089, USA
| | - Chunfang Gao
- Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China
| |
Collapse
|
763
|
Abstract
BACKGROUND Predicting the response to a drug for cancer disease patients based on genomic information is an important problem in modern clinical oncology. This problem occurs in part because many available drug sensitivity prediction algorithms do not consider better quality cancer cell lines and the adoption of new feature representations; both lead to the accurate prediction of drug responses. By predicting accurate drug responses to cancer, oncologists gain a more complete understanding of the effective treatments for each patient, which is a core goal in precision medicine. RESULTS In this paper, we model cancer drug sensitivity as a link prediction, which is shown to be an effective technique. We evaluate our proposed link prediction algorithms and compare them with an existing drug sensitivity prediction approach based on clinical trial data. The experimental results based on the clinical trial data show the stability of our link prediction algorithms, which yield the highest area under the ROC curve (AUC) and are statistically significant. CONCLUSIONS We propose a link prediction approach to obtain new feature representation. Compared with an existing approach, the results show that incorporating the new feature representation to the link prediction algorithms has significantly improved the performance.
Collapse
Affiliation(s)
- Turki Turki
- Department of Computer Science, King Abdulaziz University, P.O. Box 80221, Jeddah, 21589, Saudi Arabia. .,Bioinformatics Program and Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA.
| | - Zhi Wei
- Bioinformatics Program and Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA.
| |
Collapse
|
764
|
López B, Torrent-Fontbona F, Viñas R, Fernández-Real JM. Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction. Artif Intell Med 2017; 85:43-49. [PMID: 28943335 DOI: 10.1016/j.artmed.2017.09.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 09/04/2017] [Indexed: 10/18/2022]
Abstract
OBJECTIVE The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction. METHODS We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF. RESULTS Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes. CONCLUSIONS The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption.
Collapse
Affiliation(s)
- Beatriz López
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | | | - Ramón Viñas
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | - José Manuel Fernández-Real
- Biomedical Research Institute of Girona, Avda. de França, s/n, 17007 Girona, Spain; CIBERobn Pathophysiology of Obesity and Nutrition, Instituto de Salud Carlos III, Madrid, Spain.
| |
Collapse
|
765
|
Naulaerts S, Dang CC, Ballester PJ. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget 2017; 8:97025-97040. [PMID: 29228590 PMCID: PMC5722542 DOI: 10.18632/oncotarget.20923] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/14/2017] [Indexed: 02/07/2023] Open
Abstract
Cancer drug therapies are only effective in a small proportion of patients. To make things worse, our ability to identify these responsive patients before administering a treatment is generally very limited. The recent arrival of large-scale pharmacogenomic data sets, which measure the sensitivity of molecularly profiled cancer cell lines to a panel of drugs, has boosted research on the discovery of drug sensitivity markers. However, no systematic comparison of widely-used single-gene markers with multi-gene machine-learning markers exploiting genomic data has been so far conducted. We therefore assessed the performance offered by these two types of models in discriminating between sensitive and resistant cell lines to a given drug. This was carried out for each of 127 considered drugs using genomic data characterising the cell lines. We found that the proportion of cell lines predicted to be sensitive that are actually sensitive (precision) varies strongly with the drug and type of model used. Furthermore, the proportion of sensitive cell lines that are correctly predicted as sensitive (recall) of the best single-gene marker was lower than that of the multi-gene marker in 118 of the 127 tested drugs. We conclude that single-gene markers are only able to identify those drug-sensitive cell lines with the considered actionable mutation, unlike multi-gene markers that can in principle combine multiple gene mutations to identify additional sensitive cell lines. We also found that cell line sensitivities to some drugs (e.g. Temsirolimus, 17-AAG or Methotrexate) are better predicted by these machine-learning models.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| | - Cuong C Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Pedro J Ballester
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| |
Collapse
|
766
|
Lynch CM, van Berkel VH, Frieboes HB. Application of unsupervised analysis techniques to lung cancer patient data. PLoS One 2017; 12:e0184370. [PMID: 28910336 PMCID: PMC5598970 DOI: 10.1371/journal.pone.0184370] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 08/22/2017] [Indexed: 11/24/2022] Open
Abstract
This study applies unsupervised machine learning techniques for classification and clustering to a collection of descriptive variables from 10,442 lung cancer patient records in the Surveillance, Epidemiology, and End Results (SEER) program database. The goal is to automatically classify lung cancer patients into groups based on clinically measurable disease-specific variables in order to estimate survival. Variables selected as inputs for machine learning include Number of Primaries, Age, Grade, Tumor Size, Stage, and TNM, which are numeric or can readily be converted to numeric type. Minimal up-front processing of the data enables exploring the out-of-the-box capabilities of established unsupervised learning techniques, with little human intervention through the entire process. The output of the techniques is used to predict survival time, with the efficacy of the prediction representing a proxy for the usefulness of the classification. A basic single variable linear regression against each unsupervised output is applied, and the associated Root Mean Squared Error (RMSE) value is calculated as a metric to compare between the outputs. The results show that self-ordering maps exhibit the best performance, while k-Means performs the best of the simpler classification techniques. Predicting against the full data set, it is found that their respective RMSE values (15.591 for self-ordering maps and 16.193 for k-Means) are comparable to supervised regression techniques, such as Gradient Boosting Machine (RMSE of 15.048). We conclude that unsupervised data analysis techniques may be of use to classify patients by defining the classes as effective proxies for survival prediction.
Collapse
Affiliation(s)
- Chip M. Lynch
- Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, United States of America
| | - Victor H. van Berkel
- Department of Cardiovascular and Thoracic Surgery, University of Louisville, Louisville, KY, United States of America
| | - Hermann B. Frieboes
- Department of Bioengineering, University of Louisville, Louisville, KY, United States of America
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY, United States of America
| |
Collapse
|
767
|
Ferrero E, Dunham I, Sanseau P. In silico prediction of novel therapeutic targets using gene-disease association data. J Transl Med 2017; 15:182. [PMID: 28851378 PMCID: PMC5576250 DOI: 10.1186/s12967-017-1285-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 08/22/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
Collapse
Affiliation(s)
- Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Philippe Sanseau
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
768
|
Singh A, Mishra A, Khosravi A, Khandelwal G, Jayaram B. Physico-chemical fingerprinting of RNA genes. Nucleic Acids Res 2017; 45:e47. [PMID: 27932456 PMCID: PMC5397174 DOI: 10.1093/nar/gkw1236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
We advance here a novel concept for characterizing different classes of RNA genes on the basis of physico-chemical properties of DNA sequences. As knowledge-based approaches could yield unsatisfactory outcomes due to limitations of training on available experimental data sets, alternative approaches that utilize properties intrinsic to DNA are needed to supplement training based methods and to eventually provide molecular insights into genome organization. Based on a comprehensive series of molecular dynamics simulations of Ascona B-DNA consortium, we extracted hydrogen bonding, stacking and solvation energies of all combinations of DNA sequences at the dinucleotide level and calculated these properties for different types of RNA genes. Considering ∼7.3 million mRNA, 255 524 tRNA, 40 649 rRNA (different subunits) and 5250 miRNA, 3747 snRNA, gene sequences from 9282 complete genome chromosomes of all prokaryotes and eukaryotes available at NCBI, we observed that physico-chemical properties of different functional units on genomic DNA differ in their signatures.
Collapse
Affiliation(s)
- Ankita Singh
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| | - Ali Khosravi
- Ale-Taha Institute of Higher Education, Tehran, Iran
| | - Garima Khandelwal
- Cancer Research UK Manchester Institute, The University of Manchester, Wilmslow Road, Manchester M20 4BX, UK
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India.,Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi-110016, India
| |
Collapse
|
769
|
Dong SS, Guo Y, Yao S, Chen YX, He MN, Zhang YJ, Chen XF, Chen JB, Yang TL. Integrating regulatory features data for prediction of functional disease-associated SNPs. Brief Bioinform 2017; 20:26-32. [DOI: 10.1093/bib/bbx094] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Indexed: 12/21/2022] Open
Affiliation(s)
- Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Shi Yao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Yi-Xiao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Mo-Nan He
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Yu-Jie Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Xiao-Feng Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Jia-Bin Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University
| |
Collapse
|
770
|
Cordier T, Esling P, Lejzerowicz F, Visco J, Ouadahi A, Martins C, Cedhagen T, Pawlowski J. Predicting the Ecological Quality Status of Marine Environments from eDNA Metabarcoding Data Using Supervised Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017; 51:9118-9126. [PMID: 28665601 DOI: 10.1021/acs.est.7b01518] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Monitoring biodiversity is essential to assess the impacts of increasing anthropogenic activities in marine environments. Traditionally, marine biomonitoring involves the sorting and morphological identification of benthic macro-invertebrates, which is time-consuming and taxonomic-expertise demanding. High-throughput amplicon sequencing of environmental DNA (eDNA metabarcoding) represents a promising alternative for benthic monitoring. However, an important fraction of eDNA sequences remains unassigned or belong to taxa of unknown ecology, which prevent their use for assessing the ecological quality status. Here, we show that supervised machine learning (SML) can be used to build robust predictive models for benthic monitoring, regardless of the taxonomic assignment of eDNA sequences. We tested three SML approaches to assess the environmental impact of marine aquaculture using benthic foraminifera eDNA, a group of unicellular eukaryotes known to be good bioindicators, as features to infer macro-invertebrates based biotic indices. We found similar ecological status as obtained from macro-invertebrates inventories. We argue that SML approaches could overcome and even bypass the cost and time-demanding morpho-taxonomic approaches in future biomonitoring.
Collapse
Affiliation(s)
- Tristan Cordier
- Department of Genetics and Evolution, University of Geneva , Boulevard d'Yvoy 4, CH 1205 Geneva, Switzerland
| | - Philippe Esling
- IRCAM, UMR 9912, Université Pierre et Marie Curie , 4 place Jussieu, 75005 Paris, France
| | - Franck Lejzerowicz
- Department of Genetics and Evolution, University of Geneva , Boulevard d'Yvoy 4, CH 1205 Geneva, Switzerland
| | - Joana Visco
- ID-Gene ecodiagnostics, Ltd. , chemin des Aulx 14, 1228 Plan-les-Ouates, Switzerland
| | - Amine Ouadahi
- Department of Genetics and Evolution, University of Geneva , Boulevard d'Yvoy 4, CH 1205 Geneva, Switzerland
| | - Catarina Martins
- Marine Harvest ASA , Sandviksboder 77AB, Bergen, 5035 Bergen, Norway
| | - Tomas Cedhagen
- Department of Bioscience, Section of Aquatic Biology, University of Aarhus , Building 1135, Ole Worms allé 1, DK-8000 Aarhus, Denmark
| | - Jan Pawlowski
- Department of Genetics and Evolution, University of Geneva , Boulevard d'Yvoy 4, CH 1205 Geneva, Switzerland
- ID-Gene ecodiagnostics, Ltd. , chemin des Aulx 14, 1228 Plan-les-Ouates, Switzerland
| |
Collapse
|
771
|
Lötsch J, Lippmann C, Kringel D, Ultsch A. Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective. Front Mol Neurosci 2017; 10:252. [PMID: 28848388 PMCID: PMC5550731 DOI: 10.3389/fnmol.2017.00252] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 07/26/2017] [Indexed: 12/31/2022] Open
Abstract
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence.
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany.,Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Catharina Lippmann
- Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Dario Kringel
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany
| | - Alfred Ultsch
- DataBionics Research Group, University of MarburgMarburg, Germany
| |
Collapse
|
772
|
Xie L, He S, Wen Y, Bo X, Zhang Z. Discovery of novel therapeutic properties of drugs from transcriptional responses based on multi-label classification. Sci Rep 2017; 7:7136. [PMID: 28769090 PMCID: PMC5541064 DOI: 10.1038/s41598-017-07705-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 07/03/2017] [Indexed: 11/09/2022] Open
Abstract
Drug repositioning strategies have improved substantially in recent years. At present, two advances are poised to facilitate new strategies. First, the LINCS project can provide rich transcriptome data that reflect the responses of cells upon exposure to various drugs. Second, machine learning algorithms have been applied successfully in biomedical research. In this paper, we developed a systematic method to discover novel indications for existing drugs by approaching drug repositioning as a multi-label classification task and used a Softmax regression model to predict previously unrecognized therapeutic properties of drugs based on LINCS transcriptome data. This approach to complete the said task has not been achieved in previous studies. By performing in silico comparison, we demonstrated that the proposed Softmax method showed markedly superior performance over those of other methods. Once fully trained, the method showed a training accuracy exceeding 80% and a validation accuracy of approximately 70%. We generated a highly credible set of 98 drugs with high potential to be repositioned for novel therapeutic purposes. Our case studies included zonisamide and brinzolamide, which were originally developed to treat indications of the nervous system and sensory organs, respectively. Both drugs were repurposed to the cardiovascular category.
Collapse
Affiliation(s)
- Lingwei Xie
- Software School, Xiamen University, Xiamen Fujian, 361005, P.R. China
| | - Song He
- Beijing Institute of Radiation Medicine, Beijing, 100850, P.R. China
| | - Yuqi Wen
- Beijing Institute of Radiation Medicine, Beijing, 100850, P.R. China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, 100850, P.R. China.
| | - Zhongnan Zhang
- Software School, Xiamen University, Xiamen Fujian, 361005, P.R. China.
| |
Collapse
|
773
|
Ghanat Bari M, Ung CY, Zhang C, Zhu S, Li H. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks. Sci Rep 2017; 7:6993. [PMID: 28765560 PMCID: PMC5539301 DOI: 10.1038/s41598-017-07481-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/27/2017] [Indexed: 12/25/2022] Open
Abstract
Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 108 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Choong Yong Ung
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Cheng Zhang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shizhen Zhu
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Hu Li
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA.
| |
Collapse
|
774
|
Turki T. Learning approaches to improve prediction of drug sensitivity in breast cancer patients. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:3314-3320. [PMID: 28269014 DOI: 10.1109/embc.2016.7591437] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Predicting drug response to cancer disease is an important problem in modern clinical oncology that attracted increasing recent attention from various domains such as computational biology, machine learning, and data mining. Cancer patients respond differently to each cancer therapy owing to disease diversity, genetic factors, and environmental causes. Thus, oncologists aim to identify the effective therapies for cancer patients and avoid adverse drug reactions in patients. By predicting the drug response to cancer, oncologists gain full understanding of the effective treatments on each patient, which leads to better personalized treatment. In this paper, we present three learning approaches to improve the prediction of breast cancer patients' response to chemotherapy drug: the instance selection approach, the oversampling approach, and the hybrid approach. We evaluate the performance of our approaches and compare them against the baseline approach using the Area Under the ROC Curve (AUC) on clinical trial data, in addition to testing the stability of the approaches. Our experimental results show the stability of our approaches giving the highest AUC with statistical significance.
Collapse
|
775
|
Aksenov AA, da Silva R, Knight R, Lopes NP, Dorrestein PC. Global chemical analysis of biology by mass spectrometry. Nat Rev Chem 2017. [DOI: 10.1038/s41570-017-0054] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
776
|
Richter J, Pittig A, Hollandt M, Lueken U. Bridging the Gaps Between Basic Science and Cognitive-Behavioral Treatments for Anxiety Disorders in Routine Care. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2017. [DOI: 10.1027/2151-2604/a000309] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Abstract. As a core component of cognitive-behavioral therapies (CBT), behavioral exposure is an effective treatment for anxiety disorders. Still, recent treatment studies demonstrate relatively high rates of treatment dropout, nonresponse, and relapse, indicating a substantial need for optimizing and personalizing existing treatment procedures. In the present article, we aim to address current challenges and future demands for translational research in CBT for the anxiety disorders, including (a) a better understanding of those mechanisms conferring behavioral change, (b) identifying important sources of individual variation that may act as moderators of treatment response, and (c) targeting practical barriers for dissemination of exposure therapy to routine care. Based on a recursive process model of psychotherapy research we will describe distinct steps to systematically translate basic and clinical research “from bench to bedside” to routine care, but also vice versa. Some of these aspects may stimulate the future roadmap for evidence-based psychotherapy research in order to better target the treatment of anxiety disorders as one core health challenge of our time.
Collapse
Affiliation(s)
- Jan Richter
- Department of Physiological and Clinical Psychology/Psychotherapy, University of Greifswald, Germany
| | - Andre Pittig
- Institute of Clinical Psychology and Psychotherapy, Department of Psychology, Technische Universität Dresden, Germany
| | - Maike Hollandt
- Department of Physiological and Clinical Psychology/Psychotherapy, University of Greifswald, Germany
| | - Ulrike Lueken
- Center of Mental Health, Department of Psychiatry, Psychosomatics, and Psychotherapy, University Hospital of Würzburg, Germany
- Department of Psychology, Humboldt University of Berlin, Germany
| |
Collapse
|
777
|
Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography. J Am Coll Cardiol 2017; 68:2287-2295. [PMID: 27884247 DOI: 10.1016/j.jacc.2016.08.062] [Citation(s) in RCA: 223] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 07/27/2016] [Accepted: 08/17/2016] [Indexed: 11/25/2022]
Abstract
BACKGROUND Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. OBJECTIVES This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). METHODS Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. RESULTS Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p < 0.01), average early diastolic tissue velocity (e') (p < 0.01), and strain (p = 0.04). Because ATH were younger, adjusted analysis was undertaken in younger HCM patients and compared with ATH with left ventricular wall thickness >13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. CONCLUSIONS Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience.
Collapse
Affiliation(s)
- Sukrit Narula
- Zena and Michael A. Weiner Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Khader Shameer
- Institute of Next Generation Healthcare, Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York, New York
| | - Alaa Mabrouk Salem Omar
- Zena and Michael A. Weiner Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Internal Medicine, Medical Division, National Research Center, Cairo, Egypt
| | - Joel T Dudley
- Institute of Next Generation Healthcare, Department of Genetics and Genomic Sciences, Mount Sinai Health System, New York, New York
| | - Partho P Sengupta
- Zena and Michael A. Weiner Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York.
| |
Collapse
|
778
|
Biffani S, Pausch H, Schwarzenbacher H, Biscarini F. The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle. BMC Res Notes 2017. [PMID: 28651561 PMCID: PMC5485573 DOI: 10.1186/s13104-017-2540-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Statistical and machine learning applications are increasingly popular in animal breeding and genetics, especially to compute genomic predictions for phenotypes of interest. Noise (errors) in the data may have a negative impact on the accuracy of predictions. The effects of noisy data have been investigated in genome-wide association studies for case–control experiments, and in genomic predictions for binary traits in plants. No studies have been published yet on the impact of noisy data in animal genomics. In this work, the susceptibility to noise of five classification models (Lasso-penalised logistic regression—Lasso, K-nearest neighbours—KNN, random forest—RF, support vector machines with linear—SVML—or radial—SVMR—kernel) was tested. As illustration, the identification of carriers of a recessive mutation in cattle (Bos taurus) was used. A population of 3116 Fleckvieh animals with SNP genotypes on the same chromosome as the mutation locus (BTA 19) was available. The carrier status (0/1 phenotype) was randomly sampled to generate noise. Increasing proportions of noise—up to 20%— were introduced in the data. Results SVMR and Lasso were relatively more robust to noise in the data, with total accuracy still above 0.975 and TPR (true positive rate; accuracy in the minority class) in the range 0.5–0.80 also with 17.5–20% mislabeled observations. The performance of SVML and RF decreased monotonically with increasing noise in the data, while KNN constantly failed to identify mutation carriers (observations in the minority class). The computation time increased with noise in the data, especially for the two support vector machines classifiers. Conclusions This work was the first to assess the impact of phenotyping errors on the accuracy of genomic predictions in animal genetics. The choice of the classification method can influence results in terms of higher or lower susceptibility to noise. In the presented problem, SVM with radial kernel performed relatively well even when the proportion of errors in the data reached 12.5%. Lasso was the second best method, while SVML, RF and KNN were very sensitive to noise. Taking into account both accuracy and computation time, Lasso provided the best combination. Electronic supplementary material The online version of this article (doi:10.1186/s13104-017-2540-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefano Biffani
- IBBA-CNR, Via Einstein-Loc. Cascina Codazza, 26900, Lodi, Italy.,AIA: Associazione Italiana Allevatori, Via Giuseppe Tomassetti 9, 00161, Rome, Italy
| | - Hubert Pausch
- Technische Universität München, Liesel-Beckmann Straße 1, 85354, Freising-Weihenstephan, Germany
| | | | - Filippo Biscarini
- IBBA-CNR, Via Einstein-Loc. Cascina Codazza, 26900, Lodi, Italy. .,Division of Infection & Immunity, School of Medicine, Cardiff University, Heath Park, CF14 4XN, Cardiff, UK.
| |
Collapse
|
779
|
Yao S, Guo Y, Dong SS, Hao RH, Chen XF, Chen YX, Chen JB, Tian Q, Deng HW, Yang TL. Regulatory element-based prediction identifies new susceptibility regulatory variants for osteoporosis. Hum Genet 2017. [PMID: 28634715 DOI: 10.1007/s00439-017-1825-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Despite genome-wide association studies (GWASs) have identified many susceptibility genes for osteoporosis, it still leaves a large part of missing heritability to be discovered. Integrating regulatory information and GWASs could offer new insights into the biological link between the susceptibility SNPs and osteoporosis. We generated five machine learning classifiers with osteoporosis-associated variants and regulatory features data. We gained the optimal classifier and predicted genome-wide SNPs to discover susceptibility regulatory variants. We further utilized Genetic Factors for Osteoporosis Consortium (GEFOS) and three in-house GWASs samples to validate the associations for predicted positive SNPs. The random forest classifier performed best among all machine learning methods with the F1 score of 0.8871. Using the optimized model, we predicted 37,584 candidate SNPs for osteoporosis. According to the meta-analysis results, a list of regulatory variants was significantly associated with osteoporosis after multiple testing corrections and contributed to the expression of known osteoporosis-associated protein-coding genes. In summary, combining GWASs and regulatory elements through machine learning could provide additional information for understanding the mechanism of osteoporosis. The regulatory variants we predicted will provide novel targets for etiology research and treatment of osteoporosis.
Collapse
Affiliation(s)
- Shi Yao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Ruo-Han Hao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Xiao-Feng Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Yi-Xiao Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Jia-Bin Chen
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Qing Tian
- School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China.
| |
Collapse
|
780
|
Abstract
Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.
Collapse
Affiliation(s)
- Lawrence B Holder
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA
| | - M Muksitul Haque
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA.,b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| | - Michael K Skinner
- b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| |
Collapse
|
781
|
Lee M, Roos P, Sharma N, Atalar M, Evans TA, Pellicore MJ, Davis E, Lam ATN, Stanley SE, Khalil SE, Solomon GM, Walker D, Raraigh KS, Vecchio-Pagan B, Armanios M, Cutting GR. Systematic Computational Identification of Variants That Activate Exonic and Intronic Cryptic Splice Sites. Am J Hum Genet 2017; 100:751-765. [PMID: 28475858 DOI: 10.1016/j.ajhg.2017.04.001] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 03/30/2017] [Indexed: 12/30/2022] Open
Abstract
We developed a variant-annotation method that combines sequence-based machine-learning classification with a context-dependent algorithm for selecting splice variants. Our approach is distinctive in that it compares the splice potential of a sequence bearing a variant with the splice potential of the reference sequence. After training, classification accurately identified 168 of 180 (93.3%) canonical splice sites of five genes. The combined method, CryptSplice, identified and correctly predicted the effect of 18 of 21 (86%) known splice-altering variants in CFTR, a well-studied gene whose loss-of-function variants cause cystic fibrosis (CF). Among 1,423 unannotated CFTR disease-associated variants, the method identified 32 potential exonic cryptic splice variants, two of which were experimentally evaluated and confirmed. After complete CFTR sequencing, the method found three cryptic intronic splice variants (one known and two experimentally verified) that completed the molecular diagnosis of CF in 6 of 14 individuals. CryptSplice interrogation of sequence data from six individuals with X-linked dyskeratosis congenita caused by an unknown disease-causing variant in DKC1 identified two splice-altering variants that were experimentally verified. To assess the extent to which disease-associated variants might activate cryptic splicing, we selected 458 pathogenic variants and 348 variants of uncertain significance (VUSs) classified as high confidence from ClinVar. Splice-site activation was predicted for 129 (28%) of the pathogenic variants and 75 (22%) of the VUSs. Our findings suggest that cryptic splice-site activation is more common than previously thought and should be routinely considered for all variants within the transcribed regions of genes.
Collapse
Affiliation(s)
- Melissa Lee
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | - Neeraj Sharma
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Melis Atalar
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Taylor A Evans
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Matthew J Pellicore
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Emily Davis
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Anh-Thu N Lam
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Susan E Stanley
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sara E Khalil
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - George M Solomon
- Division of Pulmonary, Allergy, and Critical Care Medicine, University of Alabama at Birmingham, Birmingham, AL 35233 USA
| | - Doug Walker
- Pediatric Pulmonary Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Karen S Raraigh
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Briana Vecchio-Pagan
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Mary Armanios
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Garry R Cutting
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
782
|
Capan M, Khojandi A, Denton BT, Williams KD, Ayer T, Chhatwal J, Kurt M, Lobo JM, Roberts MS, Zaric G, Zhang S, Schwartz JS. From Data to Improved Decisions: Operations Research in Healthcare Delivery. Med Decis Making 2017; 37:849-859. [PMID: 28423982 DOI: 10.1177/0272989x17705636] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
BACKGROUND The Operations Research Interest Group (ORIG) within the Society of Medical Decision Making (SMDM) is a multidisciplinary interest group of professionals that specializes in taking an analytical approach to medical decision making and healthcare delivery. ORIG is interested in leveraging mathematical methods associated with the field of Operations Research (OR) to obtain data-driven solutions to complex healthcare problems and encourage collaborations across disciplines. This paper introduces OR for the non-expert and draws attention to opportunities where OR can be utilized to facilitate solutions to healthcare problems. METHODS Decision making is the process of choosing between possible solutions to a problem with respect to certain metrics. OR concepts can help systematically improve decision making through efficient modeling techniques while accounting for relevant constraints. Depending on the problem, methods that are part of OR (e.g., linear programming, Markov Decision Processes) or methods that are derived from related fields (e.g., regression from statistics) can be incorporated into the solution approach. This paper highlights the characteristics of different OR methods that have been applied to healthcare decision making and provides examples of emerging research opportunities. EXAMPLES We illustrate OR applications in healthcare using previous studies, including diagnosis and treatment of diseases, organ transplants, and patient flow decisions. Further, we provide a selection of emerging areas for utilizing OR. CONCLUSIONS There is a timely need to inform practitioners and policy makers of the benefits of using OR techniques in solving healthcare problems. OR methods can support the development of sustainable long-term solutions across disease management, service delivery, and health policies by optimizing the performance of system elements and analyzing their interaction while considering relevant constraints.
Collapse
Affiliation(s)
- Muge Capan
- Christiana Care Health System, Value Institute, John H. Ammon Medical Education Center, Newark, DE, USA (MC, KDW)
| | - Anahita Khojandi
- Department of Industrial and Systems Engineering, University of Tennessee, Knoxville, TN, USA (AK)
| | - Brian T Denton
- Industrial and Operations Engineering and Urology, University of Michigan, Ann Arbor, MI, USA (BTD)
| | - Kimberly D Williams
- Christiana Care Health System, Value Institute, John H. Ammon Medical Education Center, Newark, DE, USA (MC, KDW)
| | - Turgay Ayer
- Christiana Care Health System, Value Institute, John H. Ammon Medical Education Center, Newark, DE, USA (MC, KDW).,Georgia Institute of Technology H Milton Stewart School of Industrial and Systems Engineering, Center for Health & Humanitarian Systems, Atlanta, GA, USA (TA)
| | - Jagpreet Chhatwal
- Harvard University, Harvard Medical School, Institute for Technology Assessment; Massachusetts General Hospital, Boston, MA, USA (JC)
| | - Murat Kurt
- Merck Research, Whitehouse Station, NJ, USA (MK)
| | - Jennifer Mason Lobo
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA (JML)
| | - Mark S Roberts
- Department of Health Policy and Management, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA (MSR)
| | - Greg Zaric
- Richard Ivey School of Business University of Western Ontario, London, ON, Canada (GZ)
| | - Shengfan Zhang
- Department of Industrial Engineering, University of Arkansas, Fayetteville, AR, USA (SZ)
| | - J Sanford Schwartz
- General Internal Medicine Division, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA (JSS)
| |
Collapse
|
783
|
Remita MA, Halioui A, Malick Diouara AA, Daigle B, Kiani G, Diallo AB. A machine learning approach for viral genome classification. BMC Bioinformatics 2017; 18:208. [PMID: 28399797 PMCID: PMC5387389 DOI: 10.1186/s12859-017-1602-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/15/2017] [Indexed: 01/18/2023] Open
Abstract
Background Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced strains of diverse virus families. Results Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR is inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It simulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two metrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR for the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human immunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species, HBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance compared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments. Conclusion The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate large scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine learning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1602-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohamed Amine Remita
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada.,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada
| | - Ahmed Halioui
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada.,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada
| | - Abou Abdallah Malick Diouara
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada.,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada
| | - Bruno Daigle
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada.,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada
| | - Golrokh Kiani
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada.,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada
| | - Abdoulaye Baniré Diallo
- Laboratoire de bioinformatique, département d'informatique, Université du Québec à Montréal, Montreal, P.O. Box 8888 Downtown Station, H3C 3P8, Qc, Canada. .,Pharmaqam Center, Université du Québec à Montréal (Québec), Montréal (Quebec), PO BOX 8888 Downtown Station, H3C 3P8, Canada.
| |
Collapse
|
784
|
Vidovic MMC, Kloft M, Müller KR, Görnitz N. ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines. PLoS One 2017; 12:e0174392. [PMID: 28346487 PMCID: PMC5367830 DOI: 10.1371/journal.pone.0174392] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 03/08/2017] [Indexed: 01/30/2023] Open
Abstract
High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.
Collapse
Affiliation(s)
| | - Marius Kloft
- Department of Computer Science, Humboldt University of Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University of Berlin, Berlin, Germany
- Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, Korea
| | - Nico Görnitz
- Machine Learning Group, Technical University of Berlin, Berlin, Germany
| |
Collapse
|
785
|
Hill AA, Crotta M, Wall B, Good L, O'Brien SJ, Guitian J. Towards an integrated food safety surveillance system: a simulation study to explore the potential of combining genomic and epidemiological metadata. ROYAL SOCIETY OPEN SCIENCE 2017; 4:160721. [PMID: 28405360 PMCID: PMC5383817 DOI: 10.1098/rsos.160721] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 02/27/2017] [Indexed: 05/05/2023]
Abstract
Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining 'big' data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.
Collapse
Affiliation(s)
| | - M. Crotta
- Royal Veterinary College, University of London, London, UK
| | - B. Wall
- Royal Veterinary College, University of London, London, UK
| | - L. Good
- Royal Veterinary College, University of London, London, UK
| | - S. J. O'Brien
- NIHR Health Protection Research Unit in Gastrointestinal Infections, UK
| | - J. Guitian
- Royal Veterinary College, University of London, London, UK
| |
Collapse
|
786
|
Tietz JI, Schwalen CJ, Patel PS, Maxson T, Blair PM, Tai HC, Zakai UI, Mitchell DA. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat Chem Biol 2017; 13:470-478. [PMID: 28244986 PMCID: PMC5391289 DOI: 10.1038/nchembio.2319] [Citation(s) in RCA: 300] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 12/06/2016] [Indexed: 12/14/2022]
Abstract
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining of RiPP data difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden-Markov-model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physicochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping to date of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including one with an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides and, more broadly, provide a framework for future genome-mining efforts.
Collapse
Affiliation(s)
- Jonathan I Tietz
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Christopher J Schwalen
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Parth S Patel
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Tucker Maxson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Patricia M Blair
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Hua-Chia Tai
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Uzma I Zakai
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Douglas A Mitchell
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
787
|
Ai L, Tian H, Chen Z, Chen H, Xu J, Fang JY. Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. Oncotarget 2017; 8:9546-9556. [PMID: 28061434 PMCID: PMC5354752 DOI: 10.18632/oncotarget.14488] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022] Open
Abstract
Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.
Collapse
Affiliation(s)
- Luoyan Ai
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Haiying Tian
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Zhaofei Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Huimin Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jie Xu
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jing-Yuan Fang
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| |
Collapse
|
788
|
Wang C, Zhang Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 2017; 38:169-177. [PMID: 27859414 PMCID: PMC5140681 DOI: 10.1002/jcc.24667] [Citation(s) in RCA: 169] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Revised: 09/06/2016] [Accepted: 10/26/2016] [Indexed: 12/16/2022]
Abstract
The development of new protein-ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein-ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein-ligand docking functions simultaneously, we have introduced a Δvina RF parameterization and feature selection framework based on random forest. Our developed scoring function Δvina RF20 , which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The Δvina RF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Chemistry, New York University, New York, New York 10003
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
789
|
Kudrin RA, Mironov AA, Stavrovskaya ED. Chromatin and Polycomb: Biology and bioinformatics. Mol Biol 2017. [DOI: 10.1134/s0026893316060121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
790
|
Velaga R, Sugimoto M. Future Paradigm of Breast Cancer Resistance and Treatment. RESISTANCE TO TARGETED ANTI-CANCER THERAPEUTICS 2017. [DOI: 10.1007/978-3-319-70142-4_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
791
|
Abstract
AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.
Collapse
|
792
|
Rouleau S, Jodoin R, Garant JM, Perreault JP. RNA G-Quadruplexes as Key Motifs of the Transcriptome. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2017; 170:1-20. [PMID: 28382477 DOI: 10.1007/10_2017_8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
G-Quadruplexes are non-canonical secondary structures that can be adopted under physiological conditions by guanine-rich DNA and RNA molecules. They have been reported to occur, and to perform multiple biological functions, in the genomes and transcriptomes of many species, including humans. This chapter focuses specifically on RNA G-quadruplexes and reviews the most recent discoveries in the field, as well as addresses the upcoming challenges researchers studying these structures face.
Collapse
Affiliation(s)
- Samuel Rouleau
- RNA Group/Groupe ARN, Département de Biochimie, Faculté de médecine des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, 3201 rue Jean-Mignault, Sherbrooke, QC, Canada, J1E 4K8
| | - Rachel Jodoin
- RNA Group/Groupe ARN, Département de Biochimie, Faculté de médecine des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, 3201 rue Jean-Mignault, Sherbrooke, QC, Canada, J1E 4K8
| | - Jean-Michel Garant
- RNA Group/Groupe ARN, Département de Biochimie, Faculté de médecine des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, 3201 rue Jean-Mignault, Sherbrooke, QC, Canada, J1E 4K8
| | - Jean-Pierre Perreault
- RNA Group/Groupe ARN, Département de Biochimie, Faculté de médecine des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, 3201 rue Jean-Mignault, Sherbrooke, QC, Canada, J1E 4K8.
| |
Collapse
|
793
|
Wunderling A, Ben Targem M, Barbier de Reuille P, Ragni L. Novel tools for quantifying secondary growth. JOURNAL OF EXPERIMENTAL BOTANY 2017; 68:89-95. [PMID: 27965365 DOI: 10.1093/jxb/erw450] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Secondary growth occurs in dicotyledons and gymnosperms, and results in an increased girth of plant organs. It is driven primarily by the vascular cambium, which produces thousands of cells throughout the life of several plant species. For instance, even in the small herbaceous model plant Arabidopsis, manual quantification of this massive process is impractical. Here, we provide a comprehensive overview of current methods used to measure radial growth. We discuss the issues and problematics related to its quantification. We highlight recent advances and tools developed for automated cellular phenotyping and its future applications.
Collapse
Affiliation(s)
- Anna Wunderling
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| | - Mehdi Ben Targem
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| | | | - Laura Ragni
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| |
Collapse
|
794
|
Balasus D, Way M, Fusilli C, Mazza T, Morgan MY, Cervello M, Giannitrapani L, Soresi M, Agliastro R, Vinciguerra M, Montalto G. The association of variants in PNPLA3 and GRP78 and the risk of developing hepatocellular carcinoma in an Italian population. Oncotarget 2016; 7:86791-86802. [PMID: 27888630 PMCID: PMC5349954 DOI: 10.18632/oncotarget.13558] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 11/07/2016] [Indexed: 12/19/2022] Open
Abstract
Hepatocellular carcinoma (HCC) has one of the worst prognoses amongst all malignancies. It commonly arises in patients with established liver disease and the diagnosis often occurs at an advanced stage. Genetic variations, such as single nucleotide polymorphisms (SNPs), may alter disease risk and thus may have use as predictive markers of disease outcome. The aims of this study were (i) to assess the association of two SNPs, rs430397 in GRP78 and rs738409 in PNPLA3 with the risk of developing HCC in a Sicilian association cohort and, (ii) to use a machine learning technique to establish a predictive combinatorial phenotypic model for HCC including rs430397 and rs738409 genotypes and clinical and laboratory attributes. The controls comprised of 304 healthy subjects while the cases comprised of 170 HCC patients the majority of whom had hepatitis C (HCV)-related cirrhosis. Significant associations were identified between the risk of developing HCC and both rs430397 (p=0.0095) and rs738409 (p=0.0063). The association between rs738409 and HCC was significantly stronger in the HCV positive cases. In the best prediction model, represented graphically by a decision tree with an acceptable misclassification rate of 17.0%, the A/A and G/A genotypes of the rs430397 variant were fixed and combined with the three rs738409 genotypes; the attributes were age, sex and alcohol. These results demonstrate significant associations between both rs430397 and rs738409 and HCC development in a Sicilian cohort. The combinatorial predictive model developed to include these genetic variants may, if validated in independent cohorts, allow for earlier diagnosis of HCC.
Collapse
Affiliation(s)
- Daniele Balasus
- Biomedical Department of Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
| | - Michael Way
- Institute for Liver & Digestive Health, Division of Medicine, Royal Free Campus, University College London, London, UK
| | - Caterina Fusilli
- IRCCS Casa Sollievo della Sofferenza, Bioinformatics Unit, San Giovanni Rotondo (FG), Italy
| | - Tommaso Mazza
- IRCCS Casa Sollievo della Sofferenza, Bioinformatics Unit, San Giovanni Rotondo (FG), Italy
| | - Marsha Y. Morgan
- Institute for Liver & Digestive Health, Division of Medicine, Royal Free Campus, University College London, London, UK
| | - Melchiorre Cervello
- Institute of Biomedicine and Molecular Immunology, National Research Council (C.N.R.), Palermo, Italy
| | - Lydia Giannitrapani
- Biomedical Department of Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
| | - Maurizio Soresi
- Biomedical Department of Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
| | - Rosalia Agliastro
- Immunohematology and Transfusion Medicine Unit, “Civico” Reference Regional Hospital, Palermo, Italy
| | - Manlio Vinciguerra
- Institute for Liver & Digestive Health, Division of Medicine, Royal Free Campus, University College London, London, UK
- Center for Translational Medicine (CTM), International Clinical Research Center (ICRC), St. Anne's University Hospital, Brno, Czech Republic
| | - Giuseppe Montalto
- Biomedical Department of Internal Medicine and Medical Specialties, University of Palermo, Palermo, Italy
- Institute of Biomedicine and Molecular Immunology, National Research Council (C.N.R.), Palermo, Italy
| |
Collapse
|
795
|
Salazar BM, Balczewski EA, Ung CY, Zhu S. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology. Int J Mol Sci 2016; 18:E37. [PMID: 28035989 PMCID: PMC5297672 DOI: 10.3390/ijms18010037] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 12/14/2016] [Accepted: 12/17/2016] [Indexed: 12/13/2022] Open
Abstract
Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.
Collapse
Affiliation(s)
- Brittany M Salazar
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN 55902, USA.
| | - Emily A Balczewski
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| | - Choong Yong Ung
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| | - Shizhen Zhu
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN 55902, USA.
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| |
Collapse
|
796
|
Jiao Y, Du P. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0081-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
797
|
Li B, Tang J, Yang Q, Cui X, Li S, Chen S, Cao Q, Xue W, Chen N, Zhu F. Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis. Sci Rep 2016; 6:38881. [PMID: 27958387 PMCID: PMC5153651 DOI: 10.1038/srep38881] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 11/15/2016] [Indexed: 02/06/2023] Open
Abstract
In untargeted metabolomics analysis, several factors (e.g., unwanted experimental & biological variations and technical errors) may hamper the identification of differential metabolic features, which requires the data-driven normalization approaches before feature selection. So far, ≥16 normalization methods have been widely applied for processing the LC/MS based metabolomics data. However, the performance and the sample size dependence of those methods have not yet been exhaustively compared and no online tool for comparatively and comprehensively evaluating the performance of all 16 normalization methods has been provided. In this study, a comprehensive comparison on these methods was conducted. As a result, 16 methods were categorized into three groups based on their normalization performances across various sample sizes. The VSN, the Log Transformation and the PQN were identified as methods of the best normalization performance, while the Contrast consistently underperformed across all sub-datasets of different benchmark data. Moreover, an interactive web tool comprehensively evaluating the performance of 16 methods specifically for normalizing LC/MS based metabolomics data was constructed and hosted at http://server.idrb.cqu.edu.cn/MetaPre/. In summary, this study could serve as a useful guidance to the selection of suitable normalization methods in analyzing the LC/MS based metabolomics data.
Collapse
Affiliation(s)
- Bo Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Jing Tang
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Qingxia Yang
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Xuejiao Cui
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Shuang Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Sijie Chen
- College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China
| | - Quanxing Cao
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Na Chen
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
798
|
Grys BT, Lo DS, Sahin N, Kraus OZ, Morris Q, Boone C, Andrews BJ. Machine learning and computer vision approaches for phenotypic profiling. J Cell Biol 2016; 216:65-71. [PMID: 27940887 PMCID: PMC5223612 DOI: 10.1083/jcb.201610026] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/18/2016] [Accepted: 11/21/2016] [Indexed: 11/27/2022] Open
Abstract
Grys et al. review computer vision and machine-learning methods that have been applied to phenotypic profiling of image-based data. Descriptions are provided for segmentation, feature extraction, selection, and dimensionality reduction, as well as clustering, outlier detection, and classification of data. With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach.
Collapse
Affiliation(s)
- Ben T Grys
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Dara S Lo
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Nil Sahin
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Oren Z Kraus
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 2E4, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 2E4, Canada
| | - Charles Boone
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada .,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Brenda J Andrews
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada .,Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
799
|
Zhao J, Bodner G, Rewald B. Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits. FRONTIERS IN PLANT SCIENCE 2016; 7:1864. [PMID: 27999587 PMCID: PMC5138212 DOI: 10.3389/fpls.2016.01864] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 11/25/2016] [Indexed: 05/29/2023]
Abstract
Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding - especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) - Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0-5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars.
Collapse
Affiliation(s)
- Jiangsan Zhao
- Department of Forest and Soil Sciences, University of Natural Resources and Life SciencesVienna, Austria
| | - Gernot Bodner
- Division of Agronomy, Department of Crop Sciences, University of Natural Resources and Life SciencesVienna, Austria
| | - Boris Rewald
- Department of Forest and Soil Sciences, University of Natural Resources and Life SciencesVienna, Austria
| |
Collapse
|
800
|
Expanding the Immunology Toolbox: Embracing Public-Data Reuse and Crowdsourcing. Immunity 2016; 45:1191-1204. [DOI: 10.1016/j.immuni.2016.12.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 11/30/2016] [Accepted: 12/01/2016] [Indexed: 12/15/2022]
|