651
|
Membrane proteins structures: A review on computational modeling tools. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2017; 1859:2021-2039. [DOI: 10.1016/j.bbamem.2017.07.008] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 07/04/2017] [Accepted: 07/13/2017] [Indexed: 01/02/2023]
|
652
|
Ogilvie LA, Kovachev A, Wierling C, Lange BMH, Lehrach H. Models of Models: A Translational Route for Cancer Treatment and Drug Development. Front Oncol 2017; 7:219. [PMID: 28971064 PMCID: PMC5609574 DOI: 10.3389/fonc.2017.00219] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 09/01/2017] [Indexed: 12/12/2022] Open
Abstract
Every patient and every disease is different. Each patient therefore requires a personalized treatment approach. For technical reasons, a personalized approach is feasible for treatment strategies such as surgery, but not for drug-based therapy or drug development. The development of individual mechanistic models of the disease process in every patient offers the possibility of attaining truly personalized drug-based therapy and prevention. The concept of virtual clinical trials and the integrated use of in silico, in vitro, and in vivo models in preclinical development could lead to significant gains in efficiency and order of magnitude increases in the cost effectiveness of drug development and approval. We have developed mechanistic computational models of large-scale cellular signal transduction networks for prediction of drug effects and functional responses, based on patient-specific multi-level omics profiles. However, a major barrier to the use of such models in a clinical and developmental context is the reliability of predictions. Here we detail how the approach of using “models of models” has the potential to impact cancer treatment and drug development. We describe the iterative refinement process that leverages the flexibility of experimental systems to generate highly dimensional data, which can be used to train and validate computational model parameters and improve model predictions. In this way, highly optimized computational models with robust predictive capacity can be generated. Such models open up a number of opportunities for cancer drug treatment and development, from enhancing the design of experimental studies, reducing costs, and improving animal welfare, to increasing the translational value of results generated.
Collapse
Affiliation(s)
| | | | | | | | - Hans Lehrach
- Alacris Theranostics GmbH, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
653
|
Xu T, Jha A, Nachev P. The dimensionalities of lesion-deficit mapping. Neuropsychologia 2017; 115:134-141. [PMID: 28935195 PMCID: PMC6018623 DOI: 10.1016/j.neuropsychologia.2017.09.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 08/09/2017] [Accepted: 09/07/2017] [Indexed: 11/18/2022]
Abstract
Lesion-deficit mapping remains the most powerful method for localising function in the human brain. As the highest court of appeal where competing theories of cerebral function conflict, it ought to be held to the most stringent inferential standards. Though at first sight elegantly transferable, the mass-univariate statistical framework popularized by functional imaging is demonstrably ill-suited to the task, both theoretically and empirically. The critical difficulty lies with the handling of the data's intrinsically high dimensionality. Conceptual opacity and computational complexity lead lesion-deficit mappers to neglect two distinct sets of anatomical interactions: those between areas unified by function, and those between areas unified by the natural pattern of pathological damage. Though both are soluble through high-dimensional multivariate analysis, the consequences of ignoring them are radically different. The former will bleach and coarsen a picture of the functional anatomy that is nonetheless broadly faithful to reality; the latter may alter it beyond all recognition. That the field continues to cling to mass-univariate methods suggests the latter problem is misidentified with the former, and that their distinction is in need of elaboration. We further argue that the vicious effects of lesion-driven interactions are not limited to anatomical localisation but will inevitably degrade purely predictive models of function such as those conceived for clinical prognostic use. Finally, we suggest there is a great deal to be learnt about lesion-mapping by simulation-based modelling of lesion data, for the fundamental problems lie upstream of the experimental data themselves.
Collapse
Affiliation(s)
| | - Ashwani Jha
- Institute of Neurology, UCL, UK; National Hospital for Neurology and Neurosurgery, Queen Square, UK
| | - Parashkev Nachev
- Institute of Neurology, UCL, UK; National Hospital for Neurology and Neurosurgery, Queen Square, UK.
| |
Collapse
|
654
|
Greiff V, Weber CR, Palme J, Bodenhofer U, Miho E, Menzel U, Reddy ST. Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires. THE JOURNAL OF IMMUNOLOGY 2017; 199:2985-2997. [DOI: 10.4049/jimmunol.1700594] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 08/16/2017] [Indexed: 11/19/2022]
|
655
|
Hugerth LW, Andersson AF. Analysing Microbial Community Composition through Amplicon Sequencing: From Sampling to Hypothesis Testing. Front Microbiol 2017; 8:1561. [PMID: 28928718 PMCID: PMC5591341 DOI: 10.3389/fmicb.2017.01561] [Citation(s) in RCA: 165] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Accepted: 08/02/2017] [Indexed: 12/20/2022] Open
Abstract
Microbial ecology as a scientific field is fundamentally driven by technological advance. The past decade's revolution in DNA sequencing cost and throughput has made it possible for most research groups to map microbial community composition in environments of interest. However, the computational and statistical methodology required to analyse this kind of data is often not part of the biologist training. In this review, we give a historical perspective on the use of sequencing data in microbial ecology and restate the current need for this method; but also highlight the major caveats with standard practices for handling these data, from sample collection and library preparation to statistical analysis. Further, we outline the main new analytical tools that have been developed in the past few years to bypass these caveats, as well as highlight the major requirements of common statistical practices and the extent to which they are applicable to microbial data. Besides delving into the meaning of select alpha- and beta-diversity measures, we give special consideration to techniques for finding the main drivers of community dissimilarity and for interaction network construction. While every project design has specific needs, this review should serve as a starting point for considering what options are available.
Collapse
Affiliation(s)
- Luisa W Hugerth
- Department of Molecular, Tumour and Cell Biology, Centre for Translational Microbiome Research, Karolinska InstitutetSolna, Sweden.,Division of Gene Technology, Science for Life Laboratory, School of Biotechnology, KTH Royal Institute of TechnologySolna, Sweden
| | - Anders F Andersson
- Division of Gene Technology, Science for Life Laboratory, School of Biotechnology, KTH Royal Institute of TechnologySolna, Sweden
| |
Collapse
|
656
|
Ferrero E, Dunham I, Sanseau P. In silico prediction of novel therapeutic targets using gene-disease association data. J Transl Med 2017; 15:182. [PMID: 28851378 PMCID: PMC5576250 DOI: 10.1186/s12967-017-1285-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 08/22/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.
Collapse
Affiliation(s)
- Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Philippe Sanseau
- Computational Biology and Stats, Target Sciences, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
657
|
Long NP, Lim DK, Mo C, Kim G, Kwon SW. Development and assessment of a lysophospholipid-based deep learning model to discriminate geographical origins of white rice. Sci Rep 2017; 7:8552. [PMID: 28819110 PMCID: PMC5561257 DOI: 10.1038/s41598-017-08892-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 07/14/2017] [Indexed: 02/06/2023] Open
Abstract
Geographical origin determination of white rice has become the major issue of food industry. However, there is still lack of a high-throughput method for rapidly and reproducibly differentiating the geographical origins of commercial white rice. In this study, we developed a method that employed lipidomics and deep learning to discriminate white rice from Korea to China. A total of 126 white rice of 30 cultivars from different regions were utilized for the method development and validation. By using direct infusion-mass spectrometry-based targeted lipidomics, 17 lysoglycerophospholipids were simultaneously characterized within minutes per sample. Unsupervised data exploration showed a noticeable overlap of white rice between two countries. In addition, lysophosphatidylcholines (lysoPCs) were prominent in white rice from Korea while lysophosphatidylethanolamines (lysoPEs) were enriched in white rice from China. A deep learning prediction model was built using 2014 white rice and validated using two different batches of 2015 white rice. The model accurately discriminated white rice from two countries. Among 10 selected predictors, lysoPC(18:2), lysoPC(14:0), and lysoPE(16:0) were the three most important features. Random forest and gradient boosting machine models also worked well in this circumstance. In conclusion, this study provides an architecture for high-throughput classification of white rice from different geographical origins.
Collapse
Affiliation(s)
- Nguyen Phuoc Long
- Research Institute of Pharmaceutical Sciences and College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dong Kyu Lim
- Research Institute of Pharmaceutical Sciences and College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Changyeun Mo
- National Institute of Agricultural Sciences, Rural Development Administration, Jeonju, 54875, Republic of Korea
| | - Giyoung Kim
- National Institute of Agricultural Sciences, Rural Development Administration, Jeonju, 54875, Republic of Korea
| | - Sung Won Kwon
- Research Institute of Pharmaceutical Sciences and College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea.
- Plant Genomics and Breeding Institute, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
658
|
Telonis AG, Magee R, Loher P, Chervoneva I, Londin E, Rigoutsos I. Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types. Nucleic Acids Res 2017; 45:2973-2985. [PMID: 28206648 PMCID: PMC5389567 DOI: 10.1093/nar/gkx082] [Citation(s) in RCA: 139] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 02/07/2017] [Indexed: 12/21/2022] Open
Abstract
Isoforms of human miRNAs (isomiRs) are constitutively expressed with tissue- and disease-subtype-dependencies. We studied 10 271 tumor datasets from The Cancer Genome Atlas (TCGA) to evaluate whether isomiRs can distinguish amongst 32 TCGA cancers. Unlike previous approaches, we built a classifier that relied solely on ‘binarized’ isomiR profiles: each isomiR is simply labeled as ‘present’ or ‘absent’. The resulting classifier successfully labeled tumor datasets with an average sensitivity of 90% and a false discovery rate (FDR) of 3%, surpassing the performance of expression-based classification. The classifier maintained its power even after a 15× reduction in the number of isomiRs that were used for training. Notably, the classifier could correctly predict the cancer type in non-TCGA datasets from diverse platforms. Our analysis revealed that the most discriminatory isomiRs happen to also be differentially expressed between normal tissue and cancer. Even so, we find that these highly discriminating isomiRs have not been attracting the most research attention in the literature. Given their ability to successfully classify datasets from 32 cancers, isomiRs and our resulting ‘Pan-cancer Atlas’ of isomiR expression could serve as a suitable framework to explore novel cancer biomarkers.
Collapse
Affiliation(s)
- Aristeidis G Telonis
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Thomas Jefferson University, PA 19107, USA
| | - Rogan Magee
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Thomas Jefferson University, PA 19107, USA
| | - Phillipe Loher
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Thomas Jefferson University, PA 19107, USA
| | - Inna Chervoneva
- Division of Biostatistics, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Eric Londin
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Thomas Jefferson University, PA 19107, USA
| | - Isidore Rigoutsos
- Computational Medicine Center, Sidney Kimmel Medical College, Thomas Jefferson University, Thomas Jefferson University, PA 19107, USA
| |
Collapse
|
659
|
Monti R, Barozzi I, Osterwalder M, Lee E, Kato M, Garvin TH, Plajzer-Frick I, Pickle CS, Akiyama JA, Afzal V, Beerenwinkel N, Dickel DE, Visel A, Pennacchio LA. Limb-Enhancer Genie: An accessible resource of accurate enhancer predictions in the developing limb. PLoS Comput Biol 2017; 13:e1005720. [PMID: 28827824 PMCID: PMC5578682 DOI: 10.1371/journal.pcbi.1005720] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 08/31/2017] [Accepted: 08/03/2017] [Indexed: 11/18/2022] Open
Abstract
Epigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers in vivo. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological interpretability. Here we present the Limb-Enhancer Genie (LEG), a collection of highly accurate, genome-wide predictions of enhancers in the developing limb, available through a user-friendly online interface. We predict limb enhancers using a combination of >50 published limb-specific datasets and clusters of evolutionarily conserved transcription factor binding sites, taking advantage of the patterns observed at previously in vivo validated elements. By combining different statistical models, our approach outperforms current state-of-the-art methods and provides interpretable measures of feature importance. Our results indicate that including a previously unappreciated score that quantifies tissue-specific nuclease accessibility significantly improves prediction performance. We demonstrate the utility of our approach through in vivo validation of newly predicted elements. Moreover, we describe general features that can guide the type of datasets to include when predicting tissue-specific enhancers genome-wide, while providing an accessible resource to the general biological community and facilitating the functional interpretation of genetic studies of limb malformations.
Collapse
Affiliation(s)
- Remo Monti
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Joint Genome Institute, U.S. Department of Energy, Walnut Creek, California, United States of America
| | - Iros Barozzi
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Marco Osterwalder
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Elizabeth Lee
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Momoe Kato
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Tyler H. Garvin
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Ingrid Plajzer-Frick
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Catherine S. Pickle
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Jennifer A. Akiyama
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Veena Afzal
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Diane E. Dickel
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Axel Visel
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Joint Genome Institute, U.S. Department of Energy, Walnut Creek, California, United States of America
- School of Natural Sciences, University of California, Merced, California, United States of America
| | - Len A. Pennacchio
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Joint Genome Institute, U.S. Department of Energy, Walnut Creek, California, United States of America
| |
Collapse
|
660
|
Ghanat Bari M, Ung CY, Zhang C, Zhu S, Li H. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks. Sci Rep 2017; 7:6993. [PMID: 28765560 PMCID: PMC5539301 DOI: 10.1038/s41598-017-07481-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/27/2017] [Indexed: 12/25/2022] Open
Abstract
Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 108 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Choong Yong Ung
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Cheng Zhang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shizhen Zhu
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Hu Li
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA.
| |
Collapse
|
661
|
Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 2017; 18:277. [PMID: 28545462 PMCID: PMC5445391 DOI: 10.1186/s12859-017-1700-2] [Citation(s) in RCA: 186] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 05/18/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. RESULTS We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. CONCLUSIONS To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.
Collapse
Affiliation(s)
- Tanlin Sun
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Bo Zhou
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing National Laboratory for Molecular Science, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.
| |
Collapse
|
662
|
Pärnamaa T, Parts L. Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning. G3 (BETHESDA, MD.) 2017; 7:1385-1392. [PMID: 28391243 PMCID: PMC5427497 DOI: 10.1534/g3.116.033654] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/22/2016] [Indexed: 11/29/2022]
Abstract
High-throughput microscopy of many single cells generates high-dimensional data that are far from straightforward to analyze. One important problem is automatically detecting the cellular compartment where a fluorescently-tagged protein resides, a task relatively simple for an experienced human, but difficult to automate on a computer. Here, we train an 11-layer neural network on data from mapping thousands of yeast proteins, achieving per cell localization classification accuracy of 91%, and per protein accuracy of 99% on held-out images. We confirm that low-level network features correspond to basic image characteristics, while deeper layers separate localization classes. Using this network as a feature calculator, we train standard classifiers that assign proteins to previously unseen compartments after observing only a small number of training examples. Our results are the most accurate subcellular localization classifications to date, and demonstrate the usefulness of deep learning for high-throughput microscopy.
Collapse
Affiliation(s)
- Tanel Pärnamaa
- Institute of Computer Science, University of Tartu, 50409, Estonia
| | - Leopold Parts
- Institute of Computer Science, University of Tartu, 50409, Estonia
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| |
Collapse
|
663
|
Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017; 2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcriptional regulatory networks are at the core of establishing cell type specific gene expression programs. In mammalian systems, such regulatory networks are determined by multiple levels of regulation, including by transcription factors, chromatin environment, and three-dimensional organization of the genome. Recent efforts to measure diverse regulatory genomic datasets across multiple cell types and tissues offer unprecedented opportunities to examine the context-specificity and dynamics of regulatory networks at a greater resolution and scale than before. In parallel, numerous computational approaches to analyze these data have emerged that serve as important tools for understanding mammalian cell type specific regulation. In this article, we review recent computational approaches to predict the expression and sequence-based regulators of a gene's expression level and examine long-range gene regulation. We highlight promising approaches, insights gained, and open challenges that need to be overcome to build a comprehensive picture of cell type specific transcriptional regulatory networks.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715
| | - Sushmita Roy
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715.,Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, WI 53792
| |
Collapse
|
664
|
Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017; 18:67. [PMID: 28395661 PMCID: PMC5387360 DOI: 10.1186/s13059-017-1189-z] [Citation(s) in RCA: 231] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 03/07/2017] [Indexed: 12/31/2022] Open
Abstract
Recent technological advances have enabled DNA methylation to be assayed at single-cell resolution. However, current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. We report DeepCpG, a computational approach based on deep neural networks to predict methylation states in single cells. We evaluate DeepCpG on single-cell methylation data from five cell types generated using alternative sequencing protocols. DeepCpG yields substantially more accurate predictions than previous methods. Additionally, we show that the model parameters can be interpreted, thereby providing insights into how sequence composition affects methylation variability.
Collapse
Affiliation(s)
- Christof Angermueller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Heather J Lee
- Epigenetics Programme, Babraham Institute, Cambridge, UK.,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Wolf Reik
- Epigenetics Programme, Babraham Institute, Cambridge, UK.,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
665
|
Sensitive detection of rare disease-associated cell subsets via representation learning. Nat Commun 2017; 8:14825. [PMID: 28382969 PMCID: PMC5384229 DOI: 10.1038/ncomms14825] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 02/02/2017] [Indexed: 01/20/2023] Open
Abstract
Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.
Collapse
|
666
|
Kan A. Machine learning applications in cell image analysis. Immunol Cell Biol 2017; 95:525-530. [PMID: 28294138 DOI: 10.1038/icb.2017.16] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 02/28/2017] [Accepted: 03/08/2017] [Indexed: 02/06/2023]
Abstract
Machine learning (ML) refers to a set of automatic pattern recognition methods that have been successfully applied across various problem domains, including biomedical image analysis. This review focuses on ML applications for image analysis in light microscopy experiments with typical tasks of segmenting and tracking individual cells, and modelling of reconstructed lineage trees. After describing a typical image analysis pipeline and highlighting challenges of automatic analysis (for example, variability in cell morphology, tracking in presence of clutters) this review gives a brief historical outlook of ML, followed by basic concepts and definitions required for understanding examples. This article then presents several example applications at various image processing stages, including the use of supervised learning methods for improving cell segmentation, and the application of active learning for tracking. The review concludes with remarks on parameter setting and future directions.
Collapse
Affiliation(s)
- Andrey Kan
- Division of Immunology, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
667
|
Nielsen J. Systems Biology of Metabolism: A Driver for Developing Personalized and Precision Medicine. Cell Metab 2017; 25:572-579. [PMID: 28273479 DOI: 10.1016/j.cmet.2017.02.002] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/20/2017] [Accepted: 01/31/2017] [Indexed: 01/21/2023]
Abstract
Systems biology uses mathematical models to analyze large datasets and simulate system behavior. It enables integrative analysis of different types of data and can thereby provide new insight into complex biological systems. Here will be discussed the challenges of using systems medicine for advancing the development of personalized and precision medicine to treat metabolic diseases like insulin resistance, obesity, NAFLD, NASH, and cancer. It will be illustrated how the concept of genome-scale metabolic models can be used for integrative analysis of big data with the objective of identifying novel biomarkers that are foundational for personalized and precision medicine.
Collapse
Affiliation(s)
- Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE41128 Gothenburg, Sweden; Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK2800 Lyngby, Denmark; Science for Life Laboratory, Royal Institute of Technology, SE17121 Stockholm, Sweden.
| |
Collapse
|
668
|
Wunderling A, Ben Targem M, Barbier de Reuille P, Ragni L. Novel tools for quantifying secondary growth. JOURNAL OF EXPERIMENTAL BOTANY 2017; 68:89-95. [PMID: 27965365 DOI: 10.1093/jxb/erw450] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Secondary growth occurs in dicotyledons and gymnosperms, and results in an increased girth of plant organs. It is driven primarily by the vascular cambium, which produces thousands of cells throughout the life of several plant species. For instance, even in the small herbaceous model plant Arabidopsis, manual quantification of this massive process is impractical. Here, we provide a comprehensive overview of current methods used to measure radial growth. We discuss the issues and problematics related to its quantification. We highlight recent advances and tools developed for automated cellular phenotyping and its future applications.
Collapse
Affiliation(s)
- Anna Wunderling
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| | - Mehdi Ben Targem
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| | | | - Laura Ragni
- ZMBP, University of Tübingen, Auf der Morgenstelle 32, D-72076 Tübingen, Germany
| |
Collapse
|
669
|
Ravi D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ. Deep Learning for Health Informatics. IEEE J Biomed Health Inform 2016; 21:4-21. [PMID: 28055930 DOI: 10.1109/jbhi.2016.2636665] [Citation(s) in RCA: 598] [Impact Index Per Article: 74.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
With a massive influx of multimodality data, the role of data analytics in health informatics has grown rapidly in the last decade. This has also prompted increasing interests in the generation of analytical, data driven models based on machine learning in health informatics. Deep learning, a technique with its foundation in artificial neural networks, is emerging in recent years as a powerful tool for machine learning, promising to reshape the future of artificial intelligence. Rapid improvements in computational power, fast data storage, and parallelization have also contributed to the rapid uptake of the technology in addition to its predictive power and ability to generate automatically optimized high-level features and semantic interpretation from the input data. This article presents a comprehensive up-to-date review of research employing deep learning in health informatics, providing a critical analysis of the relative merit, and potential pitfalls of the technique as well as its future outlook. The paper mainly focuses on key applications of deep learning in the fields of translational bioinformatics, medical imaging, pervasive sensing, medical informatics, and public health.
Collapse
|
670
|
Gong JQX, Shim JV, Núñez-Acosta E, Sobie EA. I love it when a plan comes together: Insight gained through convergence of competing mathematical models. J Mol Cell Cardiol 2016; 102:31-33. [PMID: 27913283 DOI: 10.1016/j.yjmcc.2016.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 10/26/2016] [Indexed: 01/01/2023]
Affiliation(s)
- Jingqi Q X Gong
- Department of Pharmacological Sciences, Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jaehee V Shim
- Department of Pharmacological Sciences, Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Núñez-Acosta
- Department of Pharmacological Sciences, Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric A Sobie
- Department of Pharmacological Sciences, Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
671
|
Abstract
Combinations of therapies are being actively pursued to expand therapeutic options and deal with cancer’s pervasive resistance to treatment. Research efforts to discover effective combination treatments have focused on drugs targeting intracellular processes of the cancer cells and in particular on small molecules that target aberrant kinases. Accordingly, most of the computational methods used to study, predict, and develop drug combinations concentrate on these modes of action and signaling processes within the cancer cell. This focus on the cancer cell overlooks significant opportunities to tackle other components of tumor biology that may offer greater potential for improving patient survival. Many alternative strategies have been developed to combat cancer; for example, targeting different cancer cellular processes such as epigenetic control; modulating stromal cells that interact with the tumor; strengthening physical barriers that confine tumor growth; boosting the immune system to attack tumor cells; and even regulating the microbiome to support antitumor responses. We suggest that to fully exploit these treatment modalities using effective drug combinations it is necessary to develop multiscale computational approaches that take into account the full complexity underlying the biology of a tumor, its microenvironment, and a patient’s response to the drugs. In this Opinion article, we discuss preliminary work in this area and the needs—in terms of both computational and data requirements—that will truly empower such combinations.
Collapse
Affiliation(s)
- Jonathan R Dry
- Oncology Innovative Medicines and Early Development, AstraZeneca, R&D Boston, Waltham, MA, 02451, USA.
| | - Mi Yang
- Rheinisch-Westfälische Technische Hochschule Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, 52057, Germany
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, CB10 1SD, UK. .,Rheinisch-Westfälische Technische Hochschule Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, 52057, Germany.
| |
Collapse
|
672
|
Winkler DA, Le TC. Performance of Deep and Shallow Neural Networks, the Universal Approximation Theorem, Activity Cliffs, and QSAR. Mol Inform 2016; 36. [DOI: 10.1002/minf.201600118] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 10/04/2016] [Indexed: 12/17/2022]
Affiliation(s)
- David A. Winkler
- CSIRO Manufacturing; Clayton 3168 Australia
- Monash Institute of Pharmaceutical Sciences; Monash University; Parkville 3052 Australia
- Latrobe Institute for Molecular Science; Latrobe University; Bundoora 3082 Australia
- School of Chemical and Physical Science; Flinders University; Bedford Park 5042 Australia
| | - Tu C. Le
- CSIRO Manufacturing; Clayton 3168 Australia
| |
Collapse
|
673
|
Ekins S. The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 2016; 33:2594-603. [PMID: 27599991 DOI: 10.1007/s11095-016-2029-7] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Accepted: 08/23/2016] [Indexed: 01/22/2023]
Abstract
Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina, 27526, USA. .,Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California, 94010, USA.
| |
Collapse
|
674
|
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12:878. [PMID: 27474269 PMCID: PMC4965871 DOI: 10.15252/msb.20156651] [Citation(s) in RCA: 669] [Impact Index Per Article: 83.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 06/02/2016] [Accepted: 06/06/2016] [Indexed: 12/11/2022] Open
Abstract
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.
Collapse
Affiliation(s)
- Christof Angermueller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Tanel Pärnamaa
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Leopold Parts
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| |
Collapse
|