1
|
Bernardini G, van Iersel L, Julien E, Stougie L. Inferring phylogenetic networks from multifurcating trees via cherry picking and machine learning. Mol Phylogenet Evol 2024; 199:108137. [PMID: 39029549 DOI: 10.1016/j.ympev.2024.108137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/19/2024] [Accepted: 06/29/2024] [Indexed: 07/21/2024]
Abstract
The Hybridization problem asks to reconcile a set of conflicting phylogenetic trees into a single phylogenetic network with the smallest possible number of reticulation nodes. This problem is computationally hard and previous solutions are limited to small and/or severely restricted data sets, for example, a set of binary trees with the same taxon set or only two non-binary trees with non-equal taxon sets. Building on our previous work on binary trees, we present FHyNCH, the first algorithmic framework to heuristically solve the Hybridization problem for large sets of multifurcating trees whose sets of taxa may differ. Our heuristics combine the cherry-picking technique, recently proposed to solve the same problem for binary trees, with two carefully designed machine-learning models. We demonstrate that our methods are practical and produce qualitatively good solutions through experiments on both synthetic and real data sets.
Collapse
Affiliation(s)
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft, The Netherlands
| | - Esther Julien
- Delft Institute of Applied Mathematics, Delft, The Netherlands.
| | - Leen Stougie
- CWI, Amsterdam, the Netherlands; Vrije Universiteit, Amsterdam, The Netherlands; INRIA-Erable, France
| |
Collapse
|
2
|
Mo YK, Hahn MW, Smith ML. Applications of machine learning in phylogenetics. Mol Phylogenet Evol 2024; 196:108066. [PMID: 38565358 DOI: 10.1016/j.ympev.2024.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/16/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
Machine learning has increasingly been applied to a wide range of questions in phylogenetic inference. Supervised machine learning approaches that rely on simulated training data have been used to infer tree topologies and branch lengths, to select substitution models, and to perform downstream inferences of introgression and diversification. Here, we review how researchers have used several promising machine learning approaches to make phylogenetic inferences. Despite the promise of these methods, several barriers prevent supervised machine learning from reaching its full potential in phylogenetics. We discuss these barriers and potential paths forward. In the future, we expect that the application of careful network designs and data encodings will allow supervised machine learning to accommodate the complex processes that continue to confound traditional phylogenetic methods.
Collapse
Affiliation(s)
- Yu K Mo
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA; Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Megan L Smith
- Department of Biological Sciences, Mississippi State University, Starkville, MS 39762, USA.
| |
Collapse
|
3
|
Jamialahmadi H, Khalili-Tanha G, Nazari E, Rezaei-Tavirani M. Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2024; 17:241-252. [PMID: 39308539 PMCID: PMC11413381 DOI: 10.22037/ghfbb.v17i3.2977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/11/2024] [Indexed: 09/25/2024]
Abstract
The incorporation of AI models into bioinformatics has brought about a revolutionary era in the analysis and interpretation of biological data. This mini-review offers a succinct overview of the indispensable role AI plays in the convergence of computational techniques and biological research. The search strategy followed PRISMA guidelines, encompassing databases such as PubMed, Embase, and Google Scholar to include studies published between 2018 and 2024, utilizing specific keywords. We explored the diverse applications of AI methodologies, including machine learning (ML), deep learning (DL), and natural language processing (NLP), across various domains of bioinformatics. These domains encompass genome sequencing, protein structure prediction, drug discovery, systems biology, personalized medicine, imaging, signal processing, and text mining. AI algorithms have exhibited remarkable efficacy in tackling intricate biological challenges, spanning from genome sequencing to protein structure prediction, and from drug discovery to personalized medicine. In conclusion, this study scrutinizes the evolving landscape of AI-driven tools and algorithms, emphasizing their pivotal role in expediting research, facilitating data interpretation, and catalyzing innovations in biomedical sciences.
Collapse
Affiliation(s)
- Hamid Jamialahmadi
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Ghazaleh Khalili-Tanha
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Elham Nazari
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
4
|
Yao B, Niu G, Wang Z, Mu H, Ren X, Jiao Y, Cai C, Li J. Kaistella polysaccharea sp. nov., isolated from Antarctic intertidal sediment produces a novel extracellular polymeric substance. Int J Syst Evol Microbiol 2023; 73. [PMID: 37725075 DOI: 10.1099/ijsem.0.006037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023] Open
Abstract
An exopolysaccharide-producing bacterial strain GW4-15T, belonging to the genus Kaistella, was isolated from intertidal sediment from King George Island, Antarctic. The strain was Gram-stain-negative, aerobic, rod-shaped, non-motile and yellow-pigmented. The strain was able to grow in the presence of 0-2 % (w/v) NaCl (optimum, 0 %), at 4-30 °C (optimum, 20-28 °C) and at pH 5.0-10.0 (optimum, pH 8.0). A phylogenetic tree based on 16S rRNA gene sequences showed that strain GW4-15T formed a lineage within the genus Kaistella with the closest phylogenetic neighbours Kaistella carnis NCTC 13525T (98.3 %), Kaistella gelatinilytica G5-32T (97.7 %), Kaistella antarctica LMG 24720T (97.4 %) and Kaistella yonginensis HMD1043T (96.9 %). Digital DNA-DNA hybridization values of strain GW4-15T with K. carnis NCTC 13525T, K. antarctica LMG 24720T, K. gelatinilytica G5-32T and K. yonginensis HMD1043T were 22.8, 22.0, 21.7 and 21.6 %, respectively. The average nucleotide identity values between strain GW4-15T and K. carnis NCTC 13525T , K. antarctica LMG 24720T, K. gelatinilytica G5-32T and K. yonginensis HMD1043T were 79.3, 78.6, 77.5 and 77.2 %, respectively. The G+C content of the genome was 36.2 mol%. The major phospholipids were phosphatidylethanolamine and aminophospholipid. The predominant menaquinone was MK-6. The major fatty acids were anteiso-C15 : 0 (28.7 %), iso-C16 : 0 3-OH (15.7 %), iso-C16 : 0 H (10.0 %), iso-C16 : 0 (5.4 %), summed feature 9 (comprising iso-C17 : 1 ω9c and/or 10-methyl C16 : 0; 5.2 %) and iso-C15 : 0 (5.1 %). The monosaccharide composition of the new type of extracellular polymeric of GW4-15T was Glc, GalN, GlcN, Rha, Man and Gal with a molar ratio of 3.14 : 3.83 : 8.38 : 5.16 : 1 : 2.82. Based on phenotypic, phylogenetic and genotypic data, a novel species, Kaistella polysaccharea sp. nov., is proposed with the type strain GW4-15T (=CGMCC 1.19368T=KCTC 92753T).
Collapse
Affiliation(s)
- Boqing Yao
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| | - Guojiang Niu
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| | - Zhe Wang
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Shandong Provincial Key Laboratory of Glycoscience and Glycotechnology, Ocean University of China, Qingdao, 266003, PR China
| | - Hongmei Mu
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| | - Xingtao Ren
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| | - Yabin Jiao
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| | - Chao Cai
- Key Laboratory of Marine Drugs, Ministry of Education, School of Medicine and Pharmacy, Shandong Provincial Key Laboratory of Glycoscience and Glycotechnology, Ocean University of China, Qingdao, 266003, PR China
| | - Jing Li
- College of Marine Life Sciences, Ocean University of China, Qingdao, Shandong, 266003, PR China
| |
Collapse
|
5
|
Liu Z, Jiang W, Kim C, Peng X, Fan C, Wu Y, Xie Z, Peng F. A Pseudomonas Lysogenic Bacteriophage Crossing the Antarctic and Arctic, Representing a New Genus of Autographiviridae. Int J Mol Sci 2023; 24:ijms24087662. [PMID: 37108829 PMCID: PMC10142737 DOI: 10.3390/ijms24087662] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 04/29/2023] Open
Abstract
Polar regions tend to support simple food webs, which are vulnerable to phage-induced gene transfer or microbial death. To further investigate phage-host interactions in polar regions and the potential linkage of phage communities between the two poles, we induced the release of a lysogenic phage, vB_PaeM-G11, from Pseudomonas sp. D3 isolated from the Antarctic, which formed clear phage plaques on the lawn of Pseudomonas sp. G11 isolated from the Arctic. From permafrost metagenomic data of the Arctic tundra, we found the genome with high-similarity to that of vB_PaeM-G11, demonstrating that vB_PaeM-G11 may have a distribution in both the Antarctic and Arctic. Phylogenetic analysis indicated that vB_PaeM-G11 is homologous to five uncultured viruses, and that they may represent a new genus in the Autographiviridae family, named Fildesvirus here. vB_PaeM-G11 was stable in a temperature range (4-40 °C) and pH (4-11), with latent and rise periods of about 40 and 10 min, respectively. This study is the first isolation and characterization study of a Pseudomonas phage distributed in both the Antarctic and Arctic, identifying its lysogenic host and lysis host, and thus provides essential information for further understanding the interaction between polar phages and their hosts and the ecological functions of phages in polar regions.
Collapse
Affiliation(s)
- Zhenyu Liu
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Wenhui Jiang
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Cholsong Kim
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Xiaoya Peng
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Cong Fan
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Yingliang Wu
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Zhixiong Xie
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Fang Peng
- College of Life Sciences, Wuhan University, Wuhan 430072, China
| |
Collapse
|
6
|
Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022; 2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]
Abstract
While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. Availability and implementation Our software is available open source at https://github.com/nishatbristy007/NSB. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Ahnaf Faisal
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | | |
Collapse
|
7
|
Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide. Genes (Basel) 2022; 13:genes13040648. [PMID: 35456454 PMCID: PMC9030792 DOI: 10.3390/genes13040648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 03/29/2022] [Accepted: 04/05/2022] [Indexed: 02/04/2023] Open
Abstract
Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus.
Collapse
|
8
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 77] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
9
|
Zaharias P, Grosshauser M, Warnow T. Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling. J Comput Biol 2022; 29:74-89. [PMID: 34986031 DOI: 10.1089/cmb.2021.0383] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois, Urbana, Illinois, USA
| | | | - Tandy Warnow
- Department of Computer Science, University of Illinois, Urbana, Illinois, USA
| |
Collapse
|
10
|
Matsumoto H, Mimori T, Fukunaga T. Novel metric for hyperbolic phylogenetic tree embeddings. Biol Methods Protoc 2021; 6:bpab006. [PMID: 33928190 PMCID: PMC8058397 DOI: 10.1093/biomethods/bpab006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 01/09/2023] Open
Abstract
Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, Nagasaki, Japan.,Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Saitama, Japan
| | - Takahiro Mimori
- Medical Image Analysis Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Tsukasa Fukunaga
- Department of Computer Science, Graduate School of Information Science and Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
11
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
12
|
Pollock LJ, O'Connor LMJ, Mokany K, Rosauer DF, Talluto L, Thuiller W. Protecting Biodiversity (in All Its Complexity): New Models and Methods. Trends Ecol Evol 2020; 35:1119-1128. [PMID: 32977981 DOI: 10.1016/j.tree.2020.08.015] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 08/24/2020] [Accepted: 08/25/2020] [Indexed: 11/21/2022]
Abstract
We are facing a biodiversity crisis at the same time as we are acquiring an unprecedented view of the world's biodiversity. Vast new datasets (e.g., species distributions, traits, phylogenies, and interaction networks) hold knowledge to better comprehend the depths of biodiversity change, reliably anticipate these changes, and inform conservation actions. To harness this information for conservation, we need to integrate the largely independent fields of biodiversity modeling and conservation. We highlight new developments in each respective field, early examples of how they are being brought together, and ideas for a future synthesis such that conservation decisions can be made with fuller awareness of the biodiversity at stake.
Collapse
Affiliation(s)
- Laura J Pollock
- Department of Biology, McGill University, 1205 Dr. Penfield Avenue, Montréal, Québec H3A 1B1, Canada; Université Grenoble Alpes and Université Savoie Mont Blanc, Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Écologie Alpine (LECA), F-38000 Grenoble, France.
| | - Louise M J O'Connor
- Université Grenoble Alpes and Université Savoie Mont Blanc, Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Écologie Alpine (LECA), F-38000 Grenoble, France
| | - Karel Mokany
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), PO Box 1700, Canberra, ACT 2601, Australia
| | - Dan F Rosauer
- Research School of Biology, Australian National University, Acton, Canberra, ACT 2601, Australia
| | - Lauren Talluto
- Department of Ecohydrology, Leibniz Institute for Freshwater Ecology and Inland Fisheries, Müggelseedamm 310, 12587 Berlin, Germany; Department of Ecology, University of Innsbruck, Innrain 52, AT-6020 Innsbruck, Austria
| | - Wilfried Thuiller
- Université Grenoble Alpes and Université Savoie Mont Blanc, Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Écologie Alpine (LECA), F-38000 Grenoble, France
| |
Collapse
|