101
|
Fountain-Jones NM, Smith ML, Austerlitz F. Machine learning in molecular ecology. Mol Ecol Resour 2021; 21:2589-2597. [PMID: 34738721 DOI: 10.1111/1755-0998.13532] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/26/2022]
Affiliation(s)
| | - Megan L Smith
- Department of Biology, Indiana University, Bloomington, Indiana, USA
| | | |
Collapse
|
102
|
Overcast I, Ruffley M, Rosindell J, Harmon L, Borges PAV, Emerson BC, Etienne RS, Gillespie R, Krehenwinkel H, Mahler DL, Massol F, Parent CE, Patiño J, Peter B, Week B, Wagner C, Hickerson MJ, Rominger A. A unified model of species abundance, genetic diversity, and functional diversity reveals the mechanisms structuring ecological communities. Mol Ecol Resour 2021; 21:2782-2800. [PMID: 34569715 PMCID: PMC9297962 DOI: 10.1111/1755-0998.13514] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/01/2021] [Accepted: 09/07/2021] [Indexed: 11/30/2022]
Abstract
Biodiversity accumulates hierarchically by means of ecological and evolutionary processes and feedbacks. Within ecological communities drift, dispersal, speciation, and selection operate simultaneously to shape patterns of biodiversity. Reconciling the relative importance of these is hindered by current models and inference methods, which tend to focus on a subset of processes and their resulting predictions. Here we introduce massive ecoevolutionary synthesis simulations (MESS), a unified mechanistic model of community assembly, rooted in classic island biogeography theory, which makes temporally explicit joint predictions across three biodiversity data axes: (i) species richness and abundances, (ii) population genetic diversities, and (iii) trait variation in a phylogenetic context. Using simulations we demonstrate that each data axis captures information at different timescales, and that integrating these axes enables discriminating among previously unidentifiable community assembly models. MESS is unique in generating predictions of community-scale genetic diversity, and in characterizing joint patterns of genetic diversity, abundance, and trait values. MESS unlocks the full potential for investigation of biodiversity processes using multidimensional community data including a genetic component, such as might be produced by contemporary eDNA or metabarcoding studies. We combine MESS with supervised machine learning to fit the parameters of the model to real data and infer processes underlying how biodiversity accumulates, using communities of tropical trees, arthropods, and gastropods as case studies that span a range of data availability scenarios, and spatial and taxonomic scales.
Collapse
Affiliation(s)
- Isaac Overcast
- Biology DepartmentGraduate Center of the City University of New YorkNew YorkNew YorkUSA
- Biology DepartmentCity College of New YorkNew YorkNew YorkUSA
- Division of Vertebrate ZoologyAmerican Museum of Natural HistoryNew YorkUSA
| | - Megan Ruffley
- Department of Biological SciencesUniversity of IdahoMoscowIdahoUSA
- Institute for Bioinformatics and Evolutionary Studies (IBEST)University of IdahoMoscowIdahoUSA
| | - James Rosindell
- Department of Life SciencesImperial College LondonAscotBerkshireUK
| | - Luke Harmon
- Department of Biological SciencesUniversity of IdahoMoscowIdahoUSA
| | - Paulo A. V. Borges
- Centre for Ecology, Evolution and Environmental Changes/Azorean Biodiversity GroupFaculdade de Ciências Agrárias e do AmbienteUniversidade dos AçoresAçoresPortugal
| | - Brent C. Emerson
- Island Ecology and Evolution Research GroupInstitute of Natural Products and AgrobiologyIPNA‐CSIC)La Laguna, TenerifeCanary IslandsSpain
| | - Rampal S. Etienne
- Groningen Institute for Evolutionary Life SciencesUniversity of GroningenGroningenThe Netherlands
| | - Rosemary Gillespie
- Department of Environmental Science, Policy, and ManagementUniversity of CaliforniaBerkeleyCaliforniaUSA
| | | | - D. Luke Mahler
- Department of Ecology and Evolutionary BiologyUniversity of TorontoTorontoOntarioCanada
| | - Francois Massol
- CNRSInsermCHU LilleUniversity of LilleLilleFrance
- Center for Infection and Immunity of LilleInstitut Pasteur de LilleLilleFrance
- CNRSEvo‐Eco‐PaleoSPICI GroupUniversity of LilleLilleFrance
| | - Christine E. Parent
- Department of Biological SciencesUniversity of IdahoMoscowIdahoUSA
- Institute for Bioinformatics and Evolutionary Studies (IBEST)University of IdahoMoscowIdahoUSA
| | - Jairo Patiño
- Island Ecology and Evolution Research GroupInstitute of Natural Products and AgrobiologyIPNA‐CSIC)La Laguna, TenerifeCanary IslandsSpain
- Plant Conservation and Biogeography GroupDepartamento de BotánicaEcología y Fisiología VegetalFacultad de CienciasUniversidad de La LagunaTenerifeIslas CanariasSpain
| | - Ben Peter
- Group of Genetic Diversity through Space and TimeDepartment of Evolutionary GeneticsMax Planck Institute for Evolutionary AnthropologyLeipzigGermany
| | - Bob Week
- Department of Biological SciencesUniversity of IdahoMoscowIdahoUSA
| | - Catherine Wagner
- Department of Botany and Biodiversity InstituteUniversity of WyomingLaramieWyomingUSA
| | - Michael J. Hickerson
- Biology DepartmentGraduate Center of the City University of New YorkNew YorkNew YorkUSA
- Biology DepartmentCity College of New YorkNew YorkNew YorkUSA
- Division of Invertebrate ZoologyAmerican Museum of Natural HistoryNew YorkNew YorkUSA
| | - Andrew Rominger
- School of Biology and EcologyUniversity of MaineOronoMaineUSA
- Maine Center for Genetics in the EnvironmentUniversity of MaineOronoMaineUSA
| |
Collapse
|
103
|
Blischak PD, Barker MS, Gutenkunst RN. Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks. Mol Ecol Resour 2021; 21:2676-2688. [PMID: 33682305 PMCID: PMC8675098 DOI: 10.1111/1755-0998.13355] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 01/26/2021] [Accepted: 02/05/2021] [Indexed: 11/30/2022]
Abstract
Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P1 , P2 ), P3 ), Out) and a matrix of pairwise nucleotide divergence (dXY ) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.
Collapse
Affiliation(s)
- Paul D. Blischak
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Michael S. Barker
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Ryan N. Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
104
|
Nadachowska‐Brzyska K, Konczal M, Babik W. Navigating the temporal continuum of effective population size. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13740] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
| | | | - Wieslaw Babik
- Jagiellonian University in Kraków Faculty of Biology Institute of Environmental Sciences Kraków Poland
| |
Collapse
|
105
|
Ma EZ, Hoegler KM, Zhou AE. Bioinformatic and Machine Learning Applications in Melanoma Risk Assessment and Prognosis: A Literature Review. Genes (Basel) 2021; 12:1751. [PMID: 34828357 PMCID: PMC8621295 DOI: 10.3390/genes12111751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 10/19/2021] [Accepted: 10/28/2021] [Indexed: 12/20/2022] Open
Abstract
Over 100,000 people are diagnosed with cutaneous melanoma each year in the United States. Despite recent advancements in metastatic melanoma treatment, such as immunotherapy, there are still over 7000 melanoma-related deaths each year. Melanoma is a highly heterogenous disease, and many underlying genetic drivers have been identified since the introduction of next-generation sequencing. Despite clinical staging guidelines, the prognosis of metastatic melanoma is variable and difficult to predict. Bioinformatic and machine learning analyses relying on genetic, clinical, and histopathologic inputs have been increasingly used to risk stratify melanoma patients with high accuracy. This literature review summarizes the key genetic drivers of melanoma and recent applications of bioinformatic and machine learning models in the risk stratification of melanoma patients. A robustly validated risk stratification tool can potentially guide the physician management of melanoma patients and ultimately improve patient outcomes.
Collapse
Affiliation(s)
| | | | - Albert E. Zhou
- Department of Dermatology, University of Maryland School of Medicine, Baltimore, MD 21230, USA; (E.Z.M.); (K.M.H.)
| |
Collapse
|
106
|
Perez MF, Bonatelli IAS, Romeiro-Brito M, Franco FF, Taylor NP, Zappi DC, Moraes EM. Coalescent-based species delimitation meets deep learning: Insights from a highly fragmented cactus system. Mol Ecol Resour 2021; 22:1016-1028. [PMID: 34669256 DOI: 10.1111/1755-0998.13534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 09/16/2021] [Accepted: 10/12/2021] [Indexed: 11/26/2022]
Abstract
Delimiting species boundaries is a major goal in evolutionary biology. An increasing volume of literature has focused on the challenges of investigating cryptic diversity within complex evolutionary scenarios of speciation, including gene flow and demographic fluctuations. New methods based on model selection, such as approximate Bayesian computation, approximate likelihoods, and machine learning are promising tools arising in this field. Here, we introduce a framework for species delimitation using the multispecies coalescent model coupled with a deep learning algorithm based on convolutional neural networks (CNNs). We compared this strategy with a similar ABC approach. We applied both methods to test species boundary hypotheses based on current and previous taxonomic delimitations as well as genetic data (sequences from 41 loci) in Pilosocereus aurisetus, a cactus species complex with a sky-island distribution and taxonomic uncertainty. To validate our method, we also applied the same strategy on data from widely accepted species from the genus Drosophila. The results show that our CNN approach has a high capacity to distinguish among the simulated species delimitation scenarios, with higher accuracy than ABC. For the cactus data set, a splitter hypothesis without gene flow showed the highest probability in both CNN and ABC approaches, a result agreeing with previous taxonomic classifications and in line with the sky-island distribution and low dispersal of P. aurisetus. Our results highlight the cryptic diversity within the P. aurisetus complex and show that CNNs are a promising approach for distinguishing complex evolutionary histories, even outperforming the accuracy of other model-based approaches such as ABC.
Collapse
Affiliation(s)
- Manolo F Perez
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil
| | - Isabel A S Bonatelli
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Ecologia e Biologia Evolutiva, Universidade Federal de São Paulo, Diadema, Brazil
| | | | - Fernando F Franco
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| | | | - Daniela C Zappi
- Programa de Pós Graduação em Botânica, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil
| | - Evandro M Moraes
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| |
Collapse
|
107
|
Tan MS, Cheah PL, Chin AV, Looi LM, Chang SW. A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach. Comput Biol Med 2021; 139:104947. [PMID: 34678481 DOI: 10.1016/j.compbiomed.2021.104947] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 12/26/2022]
Abstract
Alzheimer's Disease (AD) is a neurodegenerative disease that affects cognition and is the most common cause of dementia in the elderly. As the number of elderly individuals increases globally, the incidence and prevalence of AD are expected to increase. At present, AD is diagnosed clinically, according to accepted criteria. The essential elements in the diagnosis of AD include a patients history, a physical examination and neuropsychological testing, in addition to appropriate investigations such as neuroimaging. The omics-based approach is an emerging field of study that may not only aid in the diagnosis of AD but also facilitate the exploration of factors that influence the development of the disease. Omics techniques, including genomics, transcriptomics, proteomics and metabolomics, may reveal the pathways that lead to neuronal death and identify biomolecular markers associated with AD. This will further facilitate an understanding of AD neuropathology. In this review, omics-based approaches that were implemented in studies on AD were assessed from a bioinformatics perspective. Current state-of-the-art statistical and machine learning approaches used in the single omics analysis of AD were compared based on correlations of variants, differential expression, functional analysis and network analysis. This was followed by a review of the approaches used in the integration and analysis of multi-omics of AD. The strengths and limitations of multi-omics analysis methods were explored and the issues and challenges associated with omics studies of AD were highlighted. Lastly, future studies in this area of research were justified.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Phaik-Leng Cheah
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Ai-Vyrn Chin
- Division of Geriatric Medicine, Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Lai-Meng Looi
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Siow-Wee Chang
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
108
|
Singh D, Chaudhary P, Taunk J, Singh CK, Singh D, Tomar RSS, Aski M, Konjengbam NS, Raje RS, Singh S, Sengar RS, Yadav RK, Pal M. Fab Advances in Fabaceae for Abiotic Stress Resilience: From 'Omics' to Artificial Intelligence. Int J Mol Sci 2021; 22:10535. [PMID: 34638885 PMCID: PMC8509049 DOI: 10.3390/ijms221910535] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 11/16/2022] Open
Abstract
Legumes are a better source of proteins and are richer in diverse micronutrients over the nutritional profile of widely consumed cereals. However, when exposed to a diverse range of abiotic stresses, their overall productivity and quality are hugely impacted. Our limited understanding of genetic determinants and novel variants associated with the abiotic stress response in food legume crops restricts its amelioration. Therefore, it is imperative to understand different molecular approaches in food legume crops that can be utilized in crop improvement programs to minimize the economic loss. 'Omics'-based molecular breeding provides better opportunities over conventional breeding for diversifying the natural germplasm together with improving yield and quality parameters. Due to molecular advancements, the technique is now equipped with novel 'omics' approaches such as ionomics, epigenomics, fluxomics, RNomics, glycomics, glycoproteomics, phosphoproteomics, lipidomics, regulomics, and secretomics. Pan-omics-which utilizes the molecular bases of the stress response to identify genes (genomics), mRNAs (transcriptomics), proteins (proteomics), and biomolecules (metabolomics) associated with stress regulation-has been widely used for abiotic stress amelioration in food legume crops. Integration of pan-omics with novel omics approaches will fast-track legume breeding programs. Moreover, artificial intelligence (AI)-based algorithms can be utilized for simulating crop yield under changing environments, which can help in predicting the genetic gain beforehand. Application of machine learning (ML) in quantitative trait loci (QTL) mining will further help in determining the genetic determinants of abiotic stress tolerance in pulses.
Collapse
Affiliation(s)
- Dharmendra Singh
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Priya Chaudhary
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Jyoti Taunk
- Division of Plant Physiology, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Chandan Kumar Singh
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Deepti Singh
- Department of Botany, Meerut College, Meerut 250001, India
| | - Ram Sewak Singh Tomar
- College of Horticulture and Forestry, Rani Lakshmi Bai Central Agricultural University, Jhansi 284003, India
| | - Muraleedhar Aski
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Noren Singh Konjengbam
- College of Post Graduate Studies in Agricultural Sciences, Central Agricultural University, Imphal 793103, India
| | - Ranjeet Sharan Raje
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Sanjay Singh
- ICAR- National Institute of Plant Biotechnology, LBS Centre, Pusa Campus, New Delhi 110012, India
| | - Rakesh Singh Sengar
- College of Biotechnology, Sardar Vallabh Bhai Patel Agricultural University, Meerut 250001, India
| | - Rajendra Kumar Yadav
- Department of Genetics and Plant Breeding, Chandra Shekhar Azad University of Agriculture and Technology, Kanpur 208002, India
| | - Madan Pal
- Division of Plant Physiology, ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| |
Collapse
|
109
|
Passamonti MM, Somenzi E, Barbato M, Chillemi G, Colli L, Joost S, Milanesi M, Negrini R, Santini M, Vajana E, Williams JL, Ajmone-Marsan P. The Quest for Genes Involved in Adaptation to Climate Change in Ruminant Livestock. Animals (Basel) 2021; 11:2833. [PMID: 34679854 PMCID: PMC8532622 DOI: 10.3390/ani11102833] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/21/2021] [Accepted: 09/23/2021] [Indexed: 12/14/2022] Open
Abstract
Livestock radiated out from domestication centres to most regions of the world, gradually adapting to diverse environments, from very hot to sub-zero temperatures and from wet and humid conditions to deserts. The climate is changing; generally global temperature is increasing, although there are also more extreme cold periods, storms, and higher solar radiation. These changes impact livestock welfare and productivity. This review describes advances in the methodology for studying livestock genomes and the impact of the environment on animal production, giving examples of discoveries made. Sequencing livestock genomes has facilitated genome-wide association studies to localize genes controlling many traits, and population genetics has identified genomic regions under selection or introgressed from one breed into another to improve production or facilitate adaptation. Landscape genomics, which combines global positioning and genomics, has identified genomic features that enable animals to adapt to local environments. Combining the advances in genomics and methods for predicting changes in climate is generating an explosion of data which calls for innovations in the way big data sets are treated. Artificial intelligence and machine learning are now being used to study the interactions between the genome and the environment to identify historic effects on the genome and to model future scenarios.
Collapse
Affiliation(s)
- Matilde Maria Passamonti
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Elisa Somenzi
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Mario Barbato
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Giovanni Chillemi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Licia Colli
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Research Center on Biodiversity and Ancient DNA—BioDNA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| | - Stéphane Joost
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - Marco Milanesi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Monia Santini
- Impacts on Agriculture, Forests and Ecosystem Services (IAFES) Division, Fondazione Centro Euro-Mediterraneo Sui Cambiamenti Climatici (CMCC), Viale Trieste 127, 01100 Viterbo, Italy;
| | - Elia Vajana
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - John Lewis Williams
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Paolo Ajmone-Marsan
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Nutrigenomics and Proteomics Research Center—PRONUTRIGEN, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| |
Collapse
|
110
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
111
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
112
|
The genetic architecture of primary biliary cholangitis. Eur J Med Genet 2021; 64:104292. [PMID: 34303876 DOI: 10.1016/j.ejmg.2021.104292] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/03/2021] [Accepted: 07/21/2021] [Indexed: 12/12/2022]
Abstract
Primary biliary cholangitis (PBC) is a rare autoimmune disease of the liver affecting the small bile ducts. From a genetic point of view, PBC is a complex trait and several genetic and environmental factors have been called in action to explain its etiopathogenesis. Similarly to other complex traits, PBC has benefited from the introduction of genome-wide association studies (GWAS), which identified many variants predisposing or protecting toward the development of the disease. While a progressive endeavour toward the characterization of candidate loci and downstream pathways is currently ongoing, there is still a relatively large portion of heritability of PBC to be revealed. In addition, genetic variation behind progression of the disease and therapeutic response are mostly to be investigated yet. This review outlines the state-of-the-art regarding the genetic architecture of PBC and provides some hints for future investigations, focusing on the study of gene-gene interactions, the application of whole-genome sequencing techniques, and the investigation of X chromosome that can be helpful to cover the missing heritability gap in PBC.
Collapse
|
113
|
Roberts Kingman GA, Vyas DN, Jones FC, Brady SD, Chen HI, Reid K, Milhaven M, Bertino TS, Aguirre WE, Heins DC, von Hippel FA, Park PJ, Kirch M, Absher DM, Myers RM, Di Palma F, Bell MA, Kingsley DM, Veeramah KR. Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution. SCIENCE ADVANCES 2021; 7:7/25/eabg5285. [PMID: 34144992 PMCID: PMC8213234 DOI: 10.1126/sciadv.abg5285] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/05/2021] [Indexed: 05/30/2023]
Abstract
Similar forms often evolve repeatedly in nature, raising long-standing questions about the underlying mechanisms. Here, we use repeated evolution in stickleback to identify a large set of genomic loci that change recurrently during colonization of freshwater habitats by marine fish. The same loci used repeatedly in extant populations also show rapid allele frequency changes when new freshwater populations are experimentally established from marine ancestors. Marked genotypic and phenotypic changes arise within 5 years, facilitated by standing genetic variation and linkage between adaptive regions. Both the speed and location of changes can be predicted using empirical observations of recurrence in natural populations or fundamental genomic features like allelic age, recombination rates, density of divergent loci, and overlap with mapped traits. A composite model trained on these stickleback features can also predict the location of key evolutionary loci in Darwin's finches, suggesting that similar features are important for evolution across diverse taxa.
Collapse
Affiliation(s)
- Garrett A Roberts Kingman
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Deven N Vyas
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Felicity C Jones
- Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
| | - Shannon D Brady
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Heidi I Chen
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | - Kerry Reid
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Mark Milhaven
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| | - Thomas S Bertino
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA
| | - Windsor E Aguirre
- Department of Biological Sciences, DePaul University, Chicago, IL 60614-3207, USA
| | - David C Heins
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA
| | - Frank A von Hippel
- Department of Community, Environment and Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85724, USA
| | - Peter J Park
- Department of Biology, Farmingdale State College, Farmingdale, NY 11735-1021, USA
| | - Melanie Kirch
- Friedrich Miescher Laboratory of the Max Planck Society, Max-Planck-Ring, Tübingen, Germany
| | - Devin M Absher
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, AL 35806, USA
| | - Federica Di Palma
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Michael A Bell
- University of California Museum of Paleontology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | - David M Kingsley
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Krishna R Veeramah
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY 11794-5245, USA.
| |
Collapse
|
114
|
North HL, McGaughran A, Jiggins CD. Insights into invasive species from whole-genome resequencing. Mol Ecol 2021; 30:6289-6308. [PMID: 34041794 DOI: 10.1111/mec.15999] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/12/2021] [Accepted: 04/30/2021] [Indexed: 12/12/2022]
Abstract
Studies of invasive species can simultaneously inform management strategies and quantify rapid evolution in the wild. The role of genomics in invasion science is increasingly recognised, and the growing availability of reference genomes for invasive species is paving the way for whole-genome resequencing studies in a wide range of systems. Here, we survey the literature to assess the application of whole-genome resequencing data in invasion biology. For some applications, such as the reconstruction of invasion routes in time and space, sequencing the whole genome of many individuals can increase the accuracy of existing methods. In other cases, population genomic approaches such as haplotype analysis can permit entirely new questions to be addressed and new technologies applied. To date whole-genome resequencing has only been used in a handful of invasive systems, but these studies have confirmed the importance of processes such as balancing selection and hybridization in allowing invasive species to reuse existing adaptations and rapidly overcome the challenges of a foreign ecosystem. The use of genomic data does not constitute a paradigm shift per se, but by leveraging new theory, tools, and technologies, population genomics can provide unprecedented insight into basic and applied aspects of invasion science.
Collapse
Affiliation(s)
- Henry L North
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Angela McGaughran
- Te Aka Mātuatua/School of Science, University of Waikato, Hamilton, New Zealand
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK
| |
Collapse
|
115
|
Gower G, Picazo PI, Fumagalli M, Racimo F. Detecting adaptive introgression in human evolution using convolutional neural networks. eLife 2021; 10:64669. [PMID: 34032215 PMCID: PMC8192126 DOI: 10.7554/elife.64669] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/24/2021] [Indexed: 01/10/2023] Open
Abstract
Studies in a variety of species have shown evidence for positively selected variants introduced into a population via introgression from another, distantly related population—a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets—both phased and unphased—to detect candidates for adaptive introgression that shaped our evolutionary history.
Collapse
Affiliation(s)
- Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Pablo Iáñez Picazo
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, United Kingdom
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
116
|
Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ 2021; 9:e11376. [PMID: 34055480 PMCID: PMC8142932 DOI: 10.7717/peerj.11376] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 04/08/2021] [Indexed: 12/28/2022] Open
Abstract
Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.
Collapse
Affiliation(s)
- Natasha Pavlovikj
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Joao Carlos Gomes-Neto
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Andrew K Benson
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| |
Collapse
|
117
|
Cortés AJ, López-Hernández F. Harnessing Crop Wild Diversity for Climate Change Adaptation. Genes (Basel) 2021; 12:783. [PMID: 34065368 PMCID: PMC8161384 DOI: 10.3390/genes12050783] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/28/2021] [Accepted: 05/19/2021] [Indexed: 12/20/2022] Open
Abstract
Warming and drought are reducing global crop production with a potential to substantially worsen global malnutrition. As with the green revolution in the last century, plant genetics may offer concrete opportunities to increase yield and crop adaptability. However, the rate at which the threat is happening requires powering new strategies in order to meet the global food demand. In this review, we highlight major recent 'big data' developments from both empirical and theoretical genomics that may speed up the identification, conservation, and breeding of exotic and elite crop varieties with the potential to feed humans. We first emphasize the major bottlenecks to capture and utilize novel sources of variation in abiotic stress (i.e., heat and drought) tolerance. We argue that adaptation of crop wild relatives to dry environments could be informative on how plant phenotypes may react to a drier climate because natural selection has already tested more options than humans ever will. Because isolated pockets of cryptic diversity may still persist in remote semi-arid regions, we encourage new habitat-based population-guided collections for genebanks. We continue discussing how to systematically study abiotic stress tolerance in these crop collections of wild and landraces using geo-referencing and extensive environmental data. By uncovering the genes that underlie the tolerance adaptive trait, natural variation has the potential to be introgressed into elite cultivars. However, unlocking adaptive genetic variation hidden in related wild species and early landraces remains a major challenge for complex traits that, as abiotic stress tolerance, are polygenic (i.e., regulated by many low-effect genes). Therefore, we finish prospecting modern analytical approaches that will serve to overcome this issue. Concretely, genomic prediction, machine learning, and multi-trait gene editing, all offer innovative alternatives to speed up more accurate pre- and breeding efforts toward the increase in crop adaptability and yield, while matching future global food demands in the face of increased heat and drought. In order for these 'big data' approaches to succeed, we advocate for a trans-disciplinary approach with open-source data and long-term funding. The recent developments and perspectives discussed throughout this review ultimately aim to contribute to increased crop adaptability and yield in the face of heat waves and drought events.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Sede Medellín, Medellín 050034, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
| |
Collapse
|
118
|
Bourgeois YXC, Warren BH. An overview of current population genomics methods for the analysis of whole-genome resequencing data in eukaryotes. Mol Ecol 2021; 30:6036-6071. [PMID: 34009688 DOI: 10.1111/mec.15989] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/26/2021] [Accepted: 05/11/2021] [Indexed: 01/01/2023]
Abstract
Characterizing the population history of a species and identifying loci underlying local adaptation is crucial in functional ecology, evolutionary biology, conservation and agronomy. The constant improvement of high-throughput sequencing techniques has facilitated the production of whole genome data in a wide range of species. Population genomics now provides tools to better integrate selection into a historical framework, and take into account selection when reconstructing demographic history. However, this improvement has come with a profusion of analytical tools that can confuse and discourage users. Such confusion limits the amount of information effectively retrieved from complex genomic data sets, and impairs the diffusion of the most recent analytical tools into fields such as conservation biology. It may also lead to redundancy among methods. To address these isssues, we propose an overview of more than 100 state-of-the-art methods that can deal with whole genome data. We summarize the strategies they use to infer demographic history and selection, and discuss some of their limitations. A website listing these methods is available at www.methodspopgen.com.
Collapse
Affiliation(s)
| | - Ben H Warren
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, UA, CP 51, Paris, France
| |
Collapse
|
119
|
Fonseca EM, Colli GR, Werneck FP, Carstens BC. Phylogeographic model selection using convolutional neural networks. Mol Ecol Resour 2021; 21:2661-2675. [PMID: 33973350 DOI: 10.1111/1755-0998.13427] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 04/02/2021] [Accepted: 04/28/2021] [Indexed: 11/26/2022]
Abstract
The discipline of phylogeography has evolved rapidly in terms of the analytical toolkit used to analyse large genomic data sets. Despite substantial advances, analytical tools that could potentially address the challenges posed by increased model complexity have not been fully explored. For example, deep learning techniques are underutilized for phylogeographic model selection. In non-model organisms, the lack of information about their ecology and evolution can lead to uncertainty about which demographic models are appropriate. Here, we assess the utility of convolutional neural networks (CNNs) for assessing demographic models in South American lizards in the genus Norops. Three demographic scenarios (constant, expansion, and bottleneck) were considered for each of four inferred population-level lineages, and we found that the overall model accuracy was higher than 98% for all lineages. We then evaluated a set of 26 models that accounted for evolutionary relationships, gene flow, and changes in effective population size among the four lineages, identifying a single model with an estimated overall accuracy of 87% when using CNNs. The inferred demography of the lizard system suggests that gene flow between non-sister populations and changes in effective population sizes through time, probably in response to Pleistocene climatic oscillations, have shaped genetic diversity in this system. Approximate Bayesian computation (ABC) was applied to provide a comparison to the performance of CNNs. ABC was unable to identify a single model among the larger set of 26 models in the subsequent analysis. Our results demonstrate that CNNs can be easily and usefully incorporated into the phylogeographer's toolkit.
Collapse
Affiliation(s)
- Emanuel M Fonseca
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Guarino R Colli
- Departamento de Zoologia, Universidade de Brasília, Brasília, Brazil
| | - Fernanda P Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisas da Amazônia (INPA), Manaus, Brazil
| | - Bryan C Carstens
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
120
|
Collin FD, Durif G, Raynal L, Lombaert E, Gautier M, Vitalis R, Marin JM, Estoup A. Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest. Mol Ecol Resour 2021; 21:2598-2613. [PMID: 33950563 PMCID: PMC8596733 DOI: 10.1111/1755-0998.13413] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 03/29/2021] [Accepted: 04/28/2021] [Indexed: 01/07/2023]
Abstract
Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.
Collapse
Affiliation(s)
| | - Ghislain Durif
- IMAG, Univ Montpellier, CNRS, UMR 5149, Montpellier, France
| | - Louis Raynal
- IMAG, Univ Montpellier, CNRS, UMR 5149, Montpellier, France
| | - Eric Lombaert
- ISA, INRAE, CNRS, Univ Côte d'Azur, Sophia Antipolis, France
| | - Mathieu Gautier
- CBGP, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France
| | - Renaud Vitalis
- CBGP, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France
| | | | - Arnaud Estoup
- CBGP, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France
| |
Collapse
|
121
|
Yu GE, Shin Y, Subramaniyam S, Kang SH, Lee SM, Cho C, Lee SS, Kim CK. Machine learning, transcriptome, and genotyping chip analyses provide insights into SNP markers identifying flower color in Platycodon grandiflorus. Sci Rep 2021; 11:8019. [PMID: 33850210 PMCID: PMC8044237 DOI: 10.1038/s41598-021-87281-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 03/24/2021] [Indexed: 11/27/2022] Open
Abstract
Bellflower is an edible ornamental gardening plant in Asia. For predicting the flower color in bellflower plants, a transcriptome-wide approach based on machine learning, transcriptome, and genotyping chip analyses was used to identify SNP markers. Six machine learning methods were deployed to explore the classification potential of the selected SNPs as features in two datasets, namely training (60 RNA-Seq samples) and validation (480 Fluidigm chip samples). SNP selection was performed in sequential order. Firstly, 96 SNPs were selected from the transcriptome-wide SNPs using the principal compound analysis (PCA). Then, 9 among 96 SNPs were later identified using the Random forest based feature selection method from the Fluidigm chip dataset. Among six machines, the random forest (RF) model produced higher classification performance than the other models. The 9 SNP marker candidates selected for classifying the flower color classification were verified using the genomic DNA PCR with Sanger sequencing. Our results suggest that this methodology could be used for future selection of breeding traits even though the plant accessions are highly heterogeneous.
Collapse
Affiliation(s)
- Go-Eun Yu
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874, Korea
| | - Younhee Shin
- Research and Development Center, Insilicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea
| | | | - Sang-Ho Kang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874, Korea
| | - Si-Myung Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874, Korea
| | - Chuloh Cho
- Crop Foundation Research Division, National Institute of Crop Science, RDA, Wanju, 55365, Korea
| | - Seung-Sik Lee
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, 29 Geumgu-gil, Jeongeup, 56212, Korea
- Department of Radiation Science and Technology, University of Science and Technology, Daejeon, 34113, Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874, Korea.
| |
Collapse
|
122
|
Elhaik E, Graur D. On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn't. Genes (Basel) 2021; 12:genes12040527. [PMID: 33916341 PMCID: PMC8066263 DOI: 10.3390/genes12040527] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/22/2021] [Accepted: 03/29/2021] [Indexed: 12/12/2022] Open
Abstract
In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology, Lund University, Sölvegatan 35, 22362 Lund, Sweden
- Correspondence:
| | - Dan Graur
- Department of Biology & Biochemistry, University of Houston, Science & Research Building 2, Suite #342, 3455 Cullen Bldv., Houston, TX 77204-5001, USA;
| |
Collapse
|
123
|
Neyra JA, Kashani K. Improving the quality of care for patients requiring continuous renal replacement therapy. Semin Dial 2021; 34:501-509. [PMID: 33811790 DOI: 10.1111/sdi.12968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/12/2021] [Accepted: 03/08/2021] [Indexed: 12/16/2022]
Abstract
Continuous renal replacement therapy (CRRT) is the preferred extracorporeal kidney support therapy employed to support critically ill patients with acute or chronic kidney dysfunction in intensive care units. Significant heterogeneity in CRRT practice exists in part due to variable logistics, resources, and scarcity of evidence-based CRRT practices. Importantly, homogenization of practice patterns by developing substantial evidence and effective dissemination among providers is essential for optimizing CRRT practices. The emphasis on quality of CRRT delivery has prompted identification of potential quality indicators, development of multifaceted quality improvement initiatives, effective computer science utilization, and a surge of multidisciplinary quality assurance teams that advocate for "best" CRRT practices. This manuscript provides an overview of quality improvement methodologies and reviews candidate quality indicators of CRRT and the impact of quality improvement on enhancing CRRT delivery practices.
Collapse
Affiliation(s)
- Javier A Neyra
- Department of Internal Medicine, Division of Nephrology, Bone and Mineral Metabolism, University of Kentucky, Lexington, KY, USA
| | - Kianoush Kashani
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
124
|
Qi X, An H, Hall TE, Di C, Blischak PD, McKibben MTW, Hao Y, Conant GC, Pires JC, Barker MS. Genes derived from ancient polyploidy have higher genetic diversity and are associated with domestication in Brassica rapa. THE NEW PHYTOLOGIST 2021; 230:372-386. [PMID: 33452818 DOI: 10.1111/nph.17194] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/30/2020] [Indexed: 06/12/2023]
Abstract
Many crops are polyploid or have a polyploid ancestry. Recent phylogenetic analyses have found that polyploidy often preceded the domestication of crop plants. One explanation for this observation is that increased genetic diversity following polyploidy may have been important during the strong artificial selection that occurs during domestication. In order to test the connection between domestication and polyploidy, we identified and examined candidate genes associated with the domestication of the diverse crop varieties of Brassica rapa. Like all 'diploid' flowering plants, B. rapa has a diploidized paleopolyploid genome and experienced many rounds of whole genome duplication (WGD). We analyzed transcriptome data of more than 100 cultivated B. rapa accessions. Using a combination of approaches, we identified > 3000 candidate genes associated with the domestication of four major B. rapa crop varieties. Consistent with our expectation, we found that the candidate genes were significantly enriched with genes derived from the Brassiceae mesohexaploidy. We also observed that paleologs were significantly more diverse than non-paleologs. Our analyses find evidence for that genetic diversity derived from ancient polyploidy played a key role in the domestication of B. rapa and provide support for its importance in the success of modern agriculture.
Collapse
Affiliation(s)
- Xinshuai Qi
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Hong An
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Tara E Hall
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Chenlu Di
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Paul D Blischak
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Michael T W McKibben
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Yue Hao
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Gavin C Conant
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Michael S Barker
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
125
|
Matthey-Doret R. SimBit: A high performance, flexible and easy-to-use population genetic simulator. Mol Ecol Resour 2021; 21:1745-1754. [PMID: 33713044 DOI: 10.1111/1755-0998.13372] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 02/11/2021] [Accepted: 02/17/2021] [Indexed: 11/28/2022]
Abstract
SimBit is a general purpose, high performance forward-in-time population genetics simulator. SimBit can simulate a wide variety of selection scenarios (any selection and dominance coefficients variation, any epistatic interaction, any spatial and temporal changes of selection scenario, etc.), demographic scenarios (any changes in patch sizes, migration rates, realistic demography dependent on fecundity, hard vs. soft selection, exponential vs. logistic growth, gametic or zygotic dispersion, etc.) and mating systems (cloning and selfing rates, hermaphrodites or males and females). SimBit can also track QTLs (with hyperdimensional phenotypes, explicit fitness landscape, plasticity, developmental noise, etc.). Finally, SimBit can simulate multiple species with their ecological relationships. SimBit comes with a R wrapper that simplifies the management of an entire research project from the creation of a grid of parameters and corresponding inputs, running simulations and gathering outputs for analysis. SimBit's performance was extensively benchmarked in comparison to SLiM, Nemo and SFS_CODE, varying population size, recombination rate, mutation rate, and the number of loci. I also reproduced simulations from previous studies, benchmarked QTLs and coalescent tree recording techniques. SimBit was most often the highest performing program with the only notable exception of SLiM outperforming SimBit in scenarios with few loci and low genetic diversity.
Collapse
Affiliation(s)
- Remi Matthey-Doret
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada.,Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| |
Collapse
|
126
|
Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Mol Ecol Resour 2021; 21:2706-2718. [PMID: 33749134 DOI: 10.1111/1755-0998.13379] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 03/01/2021] [Accepted: 03/05/2021] [Indexed: 12/12/2022]
Abstract
Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.
Collapse
Affiliation(s)
- Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, Ankara, Turkey
| | - Alessandro Stella
- Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College London, London, UK
| |
Collapse
|
127
|
Abstract
A key challenge in understanding how organisms adapt to their environments is to identify the mutations and genes that make it possible. By comparing patterns of sequence variation to neutral predictions across genomes, the targets of positive selection can be located. We applied this logic to house mice that invaded Gough Island (GI), an unusual population that shows phenotypic and ecological hallmarks of selection. We used massively parallel short-read sequencing to survey the genomes of 14 GI mice. We computed a set of summary statistics to capture diverse aspects of variation across these genome sequences, used approximate Bayesian computation to reconstruct a null demographic model, and then applied machine learning to estimate the posterior probability of positive selection in each region of the genome. Using a conservative threshold, 1,463 5-kb windows show strong evidence for positive selection in GI mice but not in a mainland reference population of German mice. Disproportionate shares of these selection windows contain genes that harbor derived nonsynonymous mutations with large frequency differences. Over-represented gene ontologies in selection windows emphasize neurological themes. Inspection of genomic regions harboring many selection windows with high posterior probabilities pointed to genes with known effects on exploratory behavior and body size as potential targets. Some genes in these regions contain candidate adaptive variants, including missense mutations and/or putative regulatory mutations. Our results provide a genomic portrait of adaptation to island conditions and position GI mice as a powerful system for understanding the genetic component of natural selection.
Collapse
Affiliation(s)
- Bret A Payseur
- Laboratory of Genetics, University of Wisconsin – Madison, Madison, WI
| | - Peicheng Jing
- Laboratory of Genetics, University of Wisconsin – Madison, Madison, WI
| |
Collapse
|
128
|
Xue AT, Schrider DR, Kern AD. Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning. Mol Biol Evol 2021; 38:1168-1183. [PMID: 33022051 PMCID: PMC7947845 DOI: 10.1093/molbev/msaa259] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
Collapse
Affiliation(s)
- Alexander T Xue
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Andrew D Kern
- Institute of Ecology and Evolution, 5289 University of Oregon, Eugene, OR
| |
Collapse
|
129
|
Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes. mBio 2021; 13:e0379621. [PMID: 35164570 PMCID: PMC8844930 DOI: 10.1128/mbio.03796-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Escherichia coli, a ubiquitous commensal/pathogenic member from the Enterobacteriaceae family, accounts for high infection burden, morbidity, and mortality throughout the world. With emerging multidrug resistance (MDR) on a massive scale, E. coli has been listed as one of the Global Antimicrobial Resistance and Use Surveillance System (GLASS) priority pathogens. Understanding the resistance mechanisms and underlying genomic features appears to be of utmost importance to tackle further spread of these multidrug-resistant superbugs. While a few of the globally prevalent sequence types (STs) of E. coli, such as ST131, ST69, ST405, and ST648, have been previously reported to be highly virulent and harboring MDR, there is no clarity if certain ST lineages have a greater propensity to acquire MDR. In this study, large-scale comparative genomics of a total of 5,653 E. coli genomes from 19 ST lineages revealed ST-wide prevalence patterns of genomic features, such as antimicrobial resistance (AMR)-encoding genes/mutations, virulence genes, integrons, and transposons. Interpretation of the importance of these features using a Random Forest Classifier trained with 11,988 genomic features from whole-genome sequence data identified ST-specific or phylogroup-specific signature proteins mostly belonging to different protein superfamilies, including the toxin-antitoxin systems. Our study provides a comprehensive understanding of a myriad of genomic features, ST-specific proteins, and resistance mechanisms entailing different lineages of E. coli at the level of genomes; this could be of significant downstream importance in understanding the mechanisms of AMR, in clinical discovery, in epidemiology, and in devising control strategies. IMPORTANCE With the leap in whole-genome data being generated, the application of relevant methods to mine biologically significant information from microbial genomes is of utmost importance to public health genomics. Machine-learning methods have been used not only to mine, curate, or classify the data but also to identify the relevant features that could be linked to a particular class/target. This is perhaps one of the pioneering studies that has attempted to classify a large repertoire of E. coli genome data sets (5,653 genomes) belonging to 19 different STs (including well-studied as well as understudied STs) using machine learning approaches. Important features identified by these approaches have revealed ST-specific signature proteins, which could be further studied to predict possible associations with the phenotypic profiles, thereby providing a better understanding of virulence and the resistance mechanisms among different clonal lineages of E. coli.
Collapse
|
130
|
Yang FC, Tseng B, Lin CY, Yu YJ, Linacre A, Lee JCI. Population inference based on mitochondrial DNA control region data by the nearest neighbors algorithm. Int J Legal Med 2021; 135:1191-1199. [PMID: 33586030 DOI: 10.1007/s00414-021-02520-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/27/2021] [Indexed: 11/24/2022]
Abstract
Population and geographic assignment are frequently undertaken using DNA sequences on the mitochondrial genome. Assignment to broad continental populations is common, although finer resolution to subpopulations can be less accurate due to shared genetic ancestry at a local level and members of different ancestral subpopulations cohabiting the same geographic area. This study reports on the accuracy of population and subpopulation assignment by using the sequence data obtained from the 3070 mitochondrial genomes and applying the K-nearest neighbors (KNN) algorithm. These data also included training samples used for continental and population assignment comprised of 1105 Europeans (including Austria, France, Germany, Spain, and England and Caucasian countries), 374 Africans (including North and East Africa and non-specific area (Pan-Africa)), and 1591 Asians (including Japan, Philippines, and Taiwan). Subpopulations included in this study were 1153 mitochondrial DNA (mtDNA) control region sequences from 12 subpopulations in Taiwan (including Han, Hakka, Ami, Atayal, Bunun, Paiwan, Puyuma, Rukai, Saisiyat, Tsou, Tao, and Pingpu). Additionally, control region sequence data from a further 50 samples, obtained from the Sigma Company, were included after they were amplified and sequenced. These additional 50 samples acted as the "testing samples" to verify the accuracy of the population. In this study, based on genetic distances as genetic metric, we used the KNN algorithm and the K-weighted-nearest neighbors (KWNN) algorithm weighted by genetic distance to classify individuals into continental populations, and subpopulations within the same continent. Accuracy results of ethnic inferences at the level of continental populations and of subpopulations among KNN and KWNN algorithms were obtained. The training sample set achieved an overall accuracy of 99 to 82% for assignment to their continental populations with K values from 1 to 101. Population assignment for subpopulations with K assignments from 1 to 5 reached an accuracy of 77 to 54%. Four out of 12 Taiwanese populations returned an accuracy of assignment of over 60%, Ami (66%), Atayal (67%), Saisiyat (66%), and Tao (80%). For the testing sample set, results of ethnic prediction for continental populations with recommended K values as 5, 10, and 35, based on results of the training sample set, achieved overall an accuracy of 100 to 94%. This study provided an accurate method in population assignment for not only continental populations but also subpopulations, which can be useful in forensic and anthropological studies.
Collapse
Affiliation(s)
- Fu-Chi Yang
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Bill Tseng
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Chun-Yen Lin
- Institute of Forensic Medicine, Ministry of Justice, New Taipei City, 23016, Taiwan
| | - Yu-Jen Yu
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Adrian Linacre
- College of Science & Engineering, Flinders University, Adelaide, 5001, Australia
| | - James Chun-I Lee
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan.
| |
Collapse
|
131
|
Momeni J, Parejo M, Nielsen RO, Langa J, Montes I, Papoutsis L, Farajzadeh L, Bendixen C, Căuia E, Charrière JD, Coffey MF, Costa C, Dall'Olio R, De la Rúa P, Drazic MM, Filipi J, Galea T, Golubovski M, Gregorc A, Grigoryan K, Hatjina F, Ilyasov R, Ivanova E, Janashia I, Kandemir I, Karatasou A, Kekecoglu M, Kezic N, Matray ES, Mifsud D, Moosbeckhofer R, Nikolenko AG, Papachristoforou A, Petrov P, Pinto MA, Poskryakov AV, Sharipov AY, Siceanu A, Soysal MI, Uzunov A, Zammit-Mangion M, Vingborg R, Bouga M, Kryger P, Meixner MD, Estonba A. Authoritative subspecies diagnosis tool for European honey bees based on ancestry informative SNPs. BMC Genomics 2021; 22:101. [PMID: 33535965 PMCID: PMC7860026 DOI: 10.1186/s12864-021-07379-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 01/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With numerous endemic subspecies representing four of its five evolutionary lineages, Europe holds a large fraction of Apis mellifera genetic diversity. This diversity and the natural distribution range have been altered by anthropogenic factors. The conservation of this natural heritage relies on the availability of accurate tools for subspecies diagnosis. Based on pool-sequence data from 2145 worker bees representing 22 populations sampled across Europe, we employed two highly discriminative approaches (PCA and FST) to select the most informative SNPs for ancestry inference. RESULTS Using a supervised machine learning (ML) approach and a set of 3896 genotyped individuals, we could show that the 4094 selected single nucleotide polymorphisms (SNPs) provide an accurate prediction of ancestry inference in European honey bees. The best ML model was Linear Support Vector Classifier (Linear SVC) which correctly assigned most individuals to one of the 14 subspecies or different genetic origins with a mean accuracy of 96.2% ± 0.8 SD. A total of 3.8% of test individuals were misclassified, most probably due to limited differentiation between the subspecies caused by close geographical proximity, or human interference of genetic integrity of reference subspecies, or a combination thereof. CONCLUSIONS The diagnostic tool presented here will contribute to a sustainable conservation and support breeding activities in order to preserve the genetic heritage of European honey bees.
Collapse
Affiliation(s)
- Jamal Momeni
- Eurofins Genomics Europe Genotyping A/S (EFEG), (Former GenoSkan A/S), Aarhus, Denmark.
| | - Melanie Parejo
- Laboratory Genetics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain.,Swiss Bee Research Center, Agroscope, Bern, Switzerland
| | - Rasmus O Nielsen
- Eurofins Genomics Europe Genotyping A/S (EFEG), (Former GenoSkan A/S), Aarhus, Denmark
| | - Jorge Langa
- Laboratory Genetics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | - Iratxe Montes
- Laboratory Genetics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | - Laetitia Papoutsis
- Laboratory of Agricultural Zoology and Entomology, Agricultural University of Athens, Athens, Greece
| | - Leila Farajzadeh
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Christian Bendixen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Eliza Căuia
- Institutul de Cercetare Dezvoltare pentru Apicultura SA, Bucharest, Romania
| | | | | | - Cecilia Costa
- CREA Research Centre for Agriculture and Environment, Bologna, Italy
| | | | | | | | - Janja Filipi
- Department of Ecology, Agronomy and Aquaculture, University of Zadar, Zadar, Croatia
| | | | | | - Ales Gregorc
- Faculty of Agriculture and Life Sciences, University of Maribor, Maribor, Slovenia
| | | | - Fani Hatjina
- Department of Apiculture, Agricultural Organization 'DEMETER', Thessaloniki, Greece
| | - Rustem Ilyasov
- Division of Life Sciences, Major of Biological Sciences, and Convergence Research Center for Insect Vectors, Incheon National University, Incheon, Korea.,Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences, Ufa, Russia
| | | | | | | | | | | | | | | | - David Mifsud
- Division of Rural Sciences and Food Systems, Institute of Earth Systems, University of Malta, Msida, Malta
| | - Rudolf Moosbeckhofer
- Österreichische Agentur für Gesundheit und Ernährungssicherheit GmbH, Wien, Austria
| | - Alexei G Nikolenko
- Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences, Ufa, Russia
| | | | - Plamen Petrov
- Agricultural University of Plovdiv, Plovdiv, Bulgaria
| | - M Alice Pinto
- Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Bragança, Portugal
| | - Aleksandr V Poskryakov
- Institute of Biochemistry and Genetics, Ufa Federal Research Centre of the Russian Academy of Sciences, Ufa, Russia
| | | | - Adrian Siceanu
- Institutul de Cercetare Dezvoltare pentru Apicultura SA, Bucharest, Romania
| | | | - Aleksandar Uzunov
- Landesbetrieb Landwirtschaft Hessen, Bee Institute Kirchhain, Kirchhain, Germany.,Faculty of Agricultural Sciences and Food, University Ss. Cyril and Methodius, Skopje, Republic of Macedonia
| | | | - Rikke Vingborg
- Eurofins Genomics Europe Genotyping A/S (EFEG), (Former GenoSkan A/S), Aarhus, Denmark
| | - Maria Bouga
- Laboratory of Agricultural Zoology and Entomology, Agricultural University of Athens, Athens, Greece
| | - Per Kryger
- Department of Agroecology, Aarhus University, Slagelse, Denmark
| | - Marina D Meixner
- Landesbetrieb Landwirtschaft Hessen, Bee Institute Kirchhain, Kirchhain, Germany
| | - Andone Estonba
- Laboratory Genetics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain.
| |
Collapse
|
132
|
Smith BT, Gehara M, Harvey MG. The demography of extinction in eastern North American birds. Proc Biol Sci 2021; 288:20201945. [PMID: 33529556 DOI: 10.1098/rspb.2020.1945] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Species are being lost at an unprecedented rate during the Anthropocene. Progress has been made in clarifying how species traits influence their propensity to go extinct, but the role historical demography plays in species loss or persistence is unclear. In eastern North America, five charismatic landbirds went extinct last century, and the causes of their extinctions have been heavily debated. Although these extinctions are most often attributed to post-colonial human activity, other factors such as declining ancestral populations prior to European colonization could have made these species particularly susceptible. We used population genomic data from these extinct birds and compared them with those from four codistributed extant species. We found extinct species harboured lower genetic diversity and effective population sizes than extant species, but both extinct and non-extinct birds had similar demographic histories of population expansion. These demographic patterns are consistent with population size changes associated with glacial-interglacial cycles. The lack of support for overall population declines during the Pleistocene corroborates the view that, although species that went extinct may have been vulnerable due to low diversity or small population size, their disappearance was driven by human activities in the Anthropocene.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Marcelo Gehara
- Department of Earth and Environmental Sciences, Rutgers University Newark, 195 University Avenue, Newark, NJ 07102, USA
| | - Michael G Harvey
- Department of Biological Sciences, The University of Texas at El Paso, 500 W University Avenue, El Paso, TX 79968, USA
| |
Collapse
|
133
|
Abstract
Technological developments have revolutionized measurements on plant genotypes and phenotypes, leading to routine production of large, complex data sets. This has led to increased efforts to extract meaning from these measurements and to integrate various data sets. Concurrently, machine learning has rapidly evolved and is now widely applied in science in general and in plant genotyping and phenotyping in particular. Here, we review the application of machine learning in the context of plant science and plant breeding. We focus on analyses at different phenotype levels, from biochemical to yield, and in connecting genotypes to these. In this way, we illustrate how machine learning offers a suite of methods that enable researchers to find meaningful patterns in relevant plant data.
Collapse
Affiliation(s)
- Aalt Dirk Jan van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Gert Kootstra
- Farm Technology, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Willem Kruijer
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| |
Collapse
|
134
|
Fraïsse C, Popovic I, Mazoyer C, Spataro B, Delmotte S, Romiguier J, Loire É, Simon A, Galtier N, Duret L, Bierne N, Vekemans X, Roux C. DILS: Demographic inferences with linked selection by using ABC. Mol Ecol Resour 2021; 21:2629-2644. [PMID: 33448666 DOI: 10.1111/1755-0998.13323] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 12/09/2020] [Accepted: 12/21/2020] [Indexed: 01/21/2023]
Abstract
We present DILS, a deployable statistical analysis platform for conducting demographic inferences with linked selection from population genomic data using an Approximate Bayesian Computation framework. DILS takes as input single-population or two-population data sets (multilocus fasta sequences) and performs three types of analyses in a hierarchical manner, identifying: (a) the best demographic model to study the importance of gene flow and population size change on the genetic patterns of polymorphism and divergence, (b) the best genomic model to determine whether the effective size Ne and migration rate N, m are heterogeneously distributed along the genome (implying linked selection) and (c) loci in genomic regions most associated with barriers to gene flow. Also available via a Web interface, an objective of DILS is to facilitate collaborative research in speciation genomics. Here, we show the performance and limitations of DILS by using simulations and finally apply the method to published data on a divergence continuum composed by 28 pairs of Mytilus mussel populations/species.
Collapse
Affiliation(s)
- Christelle Fraïsse
- Institute of Science and Technology Austria, Klosterneuœburg, Austria.,Univ. Lille, CNRS, UMR 8198 - Evo-Eco-Paleo, Lille, France
| | - Iva Popovic
- School of Biological Sciences, University of Queensland, St Lucia, Qld, Australia
| | | | - Bruno Spataro
- Laboratoire de Biologie et Biométrie Évolutive CNRS UMR 5558, Université Claude Bernard, Lyon, France
| | - Stéphane Delmotte
- Laboratoire de Biologie et Biométrie Évolutive CNRS UMR 5558, Université Claude Bernard, Lyon, France
| | | | - Étienne Loire
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR, ASTRE, Montpellier, France
| | - Alexis Simon
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Nicolas Galtier
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Laurent Duret
- Laboratoire de Biologie et Biométrie Évolutive CNRS UMR 5558, Université Claude Bernard, Lyon, France
| | - Nicolas Bierne
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | | - Camille Roux
- Univ. Lille, CNRS, UMR 8198 - Evo-Eco-Paleo, Lille, France
| |
Collapse
|
135
|
Pond AJR, Hwang S, Verd B, Steventon B. A deep learning approach for staging embryonic tissue isolates with small data. PLoS One 2021; 16:e0244151. [PMID: 33417603 PMCID: PMC7793293 DOI: 10.1371/journal.pone.0244151] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022] Open
Abstract
Machine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.
Collapse
Affiliation(s)
| | - Seongwon Hwang
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Berta Verd
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Benjamin Steventon
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
136
|
Hübner S, Kantar MB. Tapping Diversity From the Wild: From Sampling to Implementation. FRONTIERS IN PLANT SCIENCE 2021; 12:626565. [PMID: 33584776 PMCID: PMC7873362 DOI: 10.3389/fpls.2021.626565] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/07/2021] [Indexed: 05/05/2023]
Abstract
The diversity observed among crop wild relatives (CWRs) and their ability to flourish in unfavorable and harsh environments have drawn the attention of plant scientists and breeders for many decades. However, it is also recognized that the benefit gained from using CWRs in breeding is a potential rose between thorns of detrimental genetic variation that is linked to the trait of interest. Despite the increased interest in CWRs, little attention was given so far to the statistical, analytical, and technical considerations that should guide the sampling design, the germplasm characterization, and later its implementation in breeding. Here, we review the entire process of sampling and identifying beneficial genetic variation in CWRs and the challenge of using it in breeding. The ability to detect beneficial genetic variation in CWRs is strongly affected by the sampling design which should be adjusted to the spatial and temporal variation of the target species, the trait of interest, and the analytical approach used. Moreover, linkage disequilibrium is a key factor that constrains the resolution of searching for beneficial alleles along the genome, and later, the ability to deplete linked deleterious genetic variation as a consequence of genetic drag. We also discuss how technological advances in genomics, phenomics, biotechnology, and data science can improve the ability to identify beneficial genetic variation in CWRs and to exploit it in strive for higher-yielding and sustainable crops.
Collapse
Affiliation(s)
- Sariel Hübner
- Galilee Research Institute (MIGAL), Tel-Hai College, Qiryat Shemona, Israel
- *Correspondence: Sariel Hübner,
| | - Michael B. Kantar
- Department of Tropical Plant and Soil Sciences, University of Hawai’i at Mânoa, Honolulu, HI, United States
| |
Collapse
|
137
|
Lozano R, Gazave E, Dos Santos JPR, Stetter MG, Valluru R, Bandillo N, Fernandes SB, Brown PJ, Shakoor N, Mockler TC, Cooper EA, Taylor Perkins M, Buckler ES, Ross-Ibarra J, Gore MA. Comparative evolutionary genetics of deleterious load in sorghum and maize. NATURE PLANTS 2021; 7:17-24. [PMID: 33452486 DOI: 10.1038/s41477-020-00834-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 12/09/2020] [Indexed: 06/12/2023]
Abstract
Sorghum and maize share a close evolutionary history that can be explored through comparative genomics1,2. To perform a large-scale comparison of the genomic variation between these two species, we analysed ~13 million variants identified from whole-genome resequencing of 499 sorghum lines together with 25 million variants previously identified in 1,218 maize lines. Deleterious mutations in both species were prevalent in pericentromeric regions, enriched in non-syntenic genes and present at low allele frequencies. A comparison of deleterious burden between sorghum and maize revealed that sorghum, in contrast to maize, departed from the domestication-cost hypothesis that predicts a higher deleterious burden among domesticates compared with wild lines. Additionally, sorghum and maize population genetic summary statistics were used to predict a gene deleterious index with an accuracy greater than 0.5. This research represents a key step towards understanding the evolutionary dynamics of deleterious variants in sorghum and provides a comparative genomics framework to start prioritizing these variants for removal through genome editing and breeding.
Collapse
Affiliation(s)
- Roberto Lozano
- Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Elodie Gazave
- Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Jhonathan P R Dos Santos
- Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Markus G Stetter
- Botanical Institute, Biozentrum, University of Cologne, Cologne, Germany
| | - Ravi Valluru
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
- University of Lincoln, Lincoln, UK
| | - Nonoy Bandillo
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
- Department of Plant Sciences, North Dakota State University, Fargo, ND, USA
| | - Samuel B Fernandes
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Patrick J Brown
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Nadia Shakoor
- Donald Danforth Plant Science Center, St. Louis, MO, USA
| | - Todd C Mockler
- Donald Danforth Plant Science Center, St. Louis, MO, USA
| | - Elizabeth A Cooper
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - M Taylor Perkins
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Edward S Buckler
- Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
- United States Department of Agriculture, Agricultural Research Service (USDA-ARS) R. W. Holley Center for Agriculture and Health, Ithaca, NY, USA
| | - Jeffrey Ross-Ibarra
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA.
- Center for Population Biology and Genome Center, University of California Davis, Davis, CA, USA.
| | - Michael A Gore
- Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
138
|
Bracher-Smith M, Crawford K, Escott-Price V. Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 2021; 26:70-79. [PMID: 32591634 PMCID: PMC7610853 DOI: 10.1038/s41380-020-0825-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/09/2020] [Accepted: 06/16/2020] [Indexed: 12/25/2022]
Abstract
Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48-0.95 AUC) and differed between schizophrenia (0.54-0.95 AUC), bipolar (0.48-0.65 AUC), autism (0.52-0.81 AUC) and anorexia (0.62-0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies.
Collapse
Affiliation(s)
- Matthew Bracher-Smith
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Karen Crawford
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
- Dementia Research Institute, School of Medicine, Cardiff University, Cardiff, UK
| | - Valentina Escott-Price
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
- Dementia Research Institute, School of Medicine, Cardiff University, Cardiff, UK.
| |
Collapse
|
139
|
Wang MWH, Goodman JM, Allen TEH. Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models. Chem Res Toxicol 2020; 34:217-239. [PMID: 33356168 DOI: 10.1021/acs.chemrestox.0c00316] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent times, machine learning has become increasingly prominent in predictive toxicology as it has shifted from in vivo studies toward in silico studies. Currently, in vitro methods together with other computational methods such as quantitative structure-activity relationship modeling and absorption, distribution, metabolism, and excretion calculations are being used. An overview of machine learning and its applications in predictive toxicology is presented here, including support vector machines (SVMs), random forest (RF) and decision trees (DTs), neural networks, regression models, naïve Bayes, k-nearest neighbors, and ensemble learning. The recent successes of these machine learning methods in predictive toxicology are summarized, and a comparison of some models used in predictive toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in predictive toxicology and offers insights into the possible areas of improvement in the field.
Collapse
Affiliation(s)
- Marcus W H Wang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, United Kingdom
| |
Collapse
|
140
|
Reyes-Herrera PH, Muñoz-Baena L, Velásquez-Zapata V, Patiño L, Delgado-Paz OA, Díaz-Diez CA, Navas-Arboleda AA, Cortés AJ. Inheritance of Rootstock Effects in Avocado ( Persea americana Mill.) cv. Hass. FRONTIERS IN PLANT SCIENCE 2020; 11:555071. [PMID: 33424874 PMCID: PMC7785968 DOI: 10.3389/fpls.2020.555071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 11/17/2020] [Indexed: 05/16/2023]
Abstract
Grafting is typically utilized to merge adapted seedling rootstocks with highly productive clonal scions. This process implies the interaction of multiple genomes to produce a unique tree phenotype. However, the interconnection of both genotypes obscures individual contributions to phenotypic variation (rootstock-mediated heritability), hampering tree breeding. Therefore, our goal was to quantify the inheritance of seedling rootstock effects on scion traits using avocado (Persea americana Mill.) cv. Hass as a model fruit tree. We characterized 240 diverse rootstocks from 8 avocado cv. Hass orchards with similar management in three regions of the province of Antioquia, northwest Andes of Colombia, using 13 microsatellite markers simple sequence repeats (SSRs). Parallel to this, we recorded 20 phenotypic traits (including morphological, biomass/reproductive, and fruit yield and quality traits) in the scions for 3 years (2015-2017). Relatedness among rootstocks was inferred through the genetic markers and inputted in a "genetic prediction" model to calculate narrow-sense heritabilities (h 2) on scion traits. We used three different randomization tests to highlight traits with consistently significant heritability estimates. This strategy allowed us to capture five traits with significant heritability values that ranged from 0.33 to 0.45 and model fits (r) that oscillated between 0.58 and 0.73 across orchards. The results showed significance in the rootstock effects for four complex harvest and quality traits (i.e., total number of fruits, number of fruits with exportation quality, and number of fruits discarded because of low weight or thrips damage), whereas the only morphological trait that had a significant heritability value was overall trunk height (an emergent property of the rootstock-scion interaction). These findings suggest the inheritance of rootstock effects, beyond root phenotype, on a surprisingly wide spectrum of scion traits in "Hass" avocado. They also reinforce the utility of polymorphic SSRs for relatedness reconstruction and genetic prediction of complex traits. This research is, up to date, the most cohesive evidence of narrow-sense inheritance of rootstock effects in a tropical fruit tree crop. Ultimately, our work highlights the importance of considering the rootstock-scion interaction to broaden the genetic basis of fruit tree breeding programs while enhancing our understanding of the consequences of grafting.
Collapse
Affiliation(s)
- Paula H. Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI Tibaitatá, Mosquera, Colombia
| | - Laura Muñoz-Baena
- Department of Microbiology and Immunology, Western University, London, ON, Canada
| | - Valeria Velásquez-Zapata
- Department of Plant Pathology and Microbiology, Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
| | - Laura Patiño
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | - Oscar A. Delgado-Paz
- Facultad de Ingenierías, Universidad Católica de Oriente—UCO, Rionegro, Antioquia
| | - Cipriano A. Díaz-Diez
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | | | - Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| |
Collapse
|
141
|
Leuchtenberger AF, Crotty SM, Drucks T, Schmidt HA, Burgstaller-Muehlbacher S, von Haeseler A. Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks. Mol Biol Evol 2020; 37:3632-3641. [PMID: 32637998 PMCID: PMC7743852 DOI: 10.1093/molbev/msaa164] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Maximum likelihood and maximum parsimony are two key methods for phylogenetic tree reconstruction. Under certain conditions, each of these two methods can perform more or less efficiently, resulting in unresolved or disputed phylogenies. We show that a neural network can distinguish between four-taxon alignments that were evolved under conditions susceptible to either long-branch attraction or long-branch repulsion. When likelihood and parsimony methods are discordant, the neural network can provide insight as to which tree reconstruction method is best suited to the alignment. When applied to the contentious case of Strepsiptera evolution, our method shows robust support for the current scientific view, that is, it places Strepsiptera with beetles, distant from flies.
Collapse
Affiliation(s)
- Alina F Leuchtenberger
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Stephen M Crotty
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
- School of Mathematical Sciences, University of Adelaide, Adelaide, SA, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, University of Adelaide, Adelaide, SA, Australia
| | - Tamara Drucks
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Heiko A Schmidt
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Sebastian Burgstaller-Muehlbacher
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| |
Collapse
|
142
|
Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chakraborty C, Mohammed IA. Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset. ACTA ACUST UNITED AC 2020; 2:11. [PMID: 33263111 PMCID: PMC7694891 DOI: 10.1007/s42979-020-00394-7] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 11/05/2020] [Indexed: 12/15/2022]
Abstract
COVID-19 or 2019-nCoV is no longer pandemic but rather endemic, with more than 651,247 people around world having lost their lives after contracting the disease. Currently, there is no specific treatment or cure for COVID-19, and thus living with the disease and its symptoms is inevitable. This reality has placed a massive burden on limited healthcare systems worldwide especially in the developing nations. Although neither an effective, clinically proven antiviral agents' strategy nor an approved vaccine exist to eradicate the COVID-19 pandemic, there are alternatives that may reduce the huge burden on not only limited healthcare systems but also the economic sector; the most promising include harnessing non-clinical techniques such as machine learning, data mining, deep learning and other artificial intelligence. These alternatives would facilitate diagnosis and prognosis for 2019-nCoV pandemic patients. Supervised machine learning models for COVID-19 infection were developed in this work with learning algorithms which include logistic regression, decision tree, support vector machine, naive Bayes, and artificial neutral network using epidemiology labeled dataset for positive and negative COVID-19 cases of Mexico. The correlation coefficient analysis between various dependent and independent features was carried out to determine a strength relationship between each dependent feature and independent feature of the dataset prior to developing the models. The 80% of the training dataset were used for training the models while the remaining 20% were used for testing the models. The result of the performance evaluation of the models showed that decision tree model has the highest accuracy of 94.99% while the Support Vector Machine Model has the highest sensitivity of 93.34% and Naïve Bayes Model has the highest specificity of 94.30%.
Collapse
Affiliation(s)
- L J Muhammad
- Department of Mathematics and Computer Science, Faculty of Science, Federal University of Kashere, P.M.B. 0182, Gombe, Nigeria
| | - Ebrahem A Algehyne
- Department of Mathematics, University of Tabuk, Tabuk, 71491 Saudi Arabia
| | - Sani Sharif Usman
- Department of Biological Sciences, Faculty of Science, Federal University of Kashere, P.M.B. 0182, Gombe, Nigeria
| | - Abdulkadir Ahmad
- Department of Computer Science, Kano University of Science and Technology, Wudil, Kano Nigeria
| | - Chinmay Chakraborty
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Ranchi, Jharkhand India
| | - I A Mohammed
- Computer Science Department, Yobe StateUniversity, Damaturu, Yobe State Nigeria
| |
Collapse
|
143
|
Genomic islands of differentiation in a rapid avian radiation have been driven by recent selective sweeps. Proc Natl Acad Sci U S A 2020; 117:30554-30565. [PMID: 33199636 DOI: 10.1073/pnas.2015987117] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Numerous studies of emerging species have identified genomic "islands" of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, with some signs pointing toward "speciation genes" that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last ∼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation.
Collapse
|
144
|
Kalyakulina A, Iannuzzi V, Sazzini M, Garagnani P, Jalan S, Franceschi C, Ivanchenko M, Giuliani C. Investigating Mitonuclear Genetic Interactions Through Machine Learning: A Case Study on Cold Adaptation Genes in Human Populations From Different European Climate Regions. Front Physiol 2020; 11:575968. [PMID: 33262703 PMCID: PMC7686538 DOI: 10.3389/fphys.2020.575968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/14/2020] [Indexed: 01/18/2023] Open
Abstract
Cold climates represent one of the major environmental challenges that anatomically modern humans faced during their dispersal out of Africa. The related adaptive traits have been achieved by modulation of thermogenesis and thermoregulation processes where nuclear (nuc) and mitochondrial (mt) genes play a major role. In human populations, mitonuclear genetic interactions are the result of both the peculiar genetic history of each human group and the different environments they have long occupied. This study aims to investigate mitonuclear genetic interactions by considering all the mitochondrial genes and 28 nuclear genes involved in brown adipose tissue metabolism, which have been previously hypothesized to be crucial for cold adaptation. For this purpose, we focused on three human populations (i.e., Finnish, British, and Central Italian people) of European ancestry from different biogeographical and climatic areas, and we used a machine learning approach to identify relevant nucDNA–mtDNA interactions that characterized each population. The obtained results are twofold: (i) at the methodological level, we demonstrated that a machine learning approach is able to detect patterns of genetic structure among human groups from different latitudes both at single genes and by considering combinations of mtDNA and nucDNA loci; (ii) at the biological level, the analysis identified population-specific nuclear genes and variants that likely play a relevant biological role in association with a mitochondrial gene (such as the “obesity gene” FTO in Finnish people). Further studies are needed to fully elucidate the evolutionary dynamics (e.g., migration, admixture, and/or local adaptation) that shaped these nucDNA–mtDNA interactions and their functional role.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Vincenzo Iannuzzi
- Alma Mater Research Institute on Global Challenges and Climate Change (Alma Climate), University of Bologna, Bologna, Italy.,Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Marco Sazzini
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Paolo Garagnani
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Italy
| | - Sarika Jalan
- Complex Systems Laboratory, Discipline of Physics, Indian Institute of Technology Indore, Indore, India.,Center for Theoretical Physics of Complex Systems, Institute for Basic Science (IBS), Daejeon, South Korea
| | - Claudio Franceschi
- Laboratory of Systems Medicine of Healthy Aging, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Mikhail Ivanchenko
- Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia.,Laboratory of Systems Medicine of Healthy Aging, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Cristina Giuliani
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy.,School of Anthropology and Museum Ethnography, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
145
|
Njage PMK, Leekitcharoenphon P, Hansen LT, Hendriksen RS, Faes C, Aerts M, Hald T. Quantitative Microbial Risk Assessment Based on Whole Genome Sequencing Data: Case of Listeria monocytogenes. Microorganisms 2020; 8:microorganisms8111772. [PMID: 33187247 PMCID: PMC7698238 DOI: 10.3390/microorganisms8111772] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/04/2020] [Accepted: 11/05/2020] [Indexed: 01/02/2023] Open
Abstract
The application of high-throughput DNA sequencing technologies (WGS) data remain an increasingly discussed but vastly unexplored resource in the public health domain of quantitative microbial risk assessment (QMRA). This is due to challenges including high dimensionality of WGS data and heterogeneity of microbial growth phenotype data. This study provides an innovative approach for modeling the impact of population heterogeneity in microbial phenotypic stress response and integrates this into predictive models inputting a high-dimensional WGS data for increased precision exposure assessment using an example of Listeria monocytogenes. Finite mixture models were used to distinguish the number of sub-populations for each of the stress phenotypes, acid, cold, salt and desiccation. Machine learning predictive models were selected from six algorithms by inputting WGS data to predict the sub-population membership of new strains with unknown stress response data. An example QMRA was conducted for cultured milk products using the strains of unknown stress phenotype to illustrate the significance of the findings of this study. Increased resistance to stress conditions leads to increased growth, the likelihood of higher exposure and probability of illness. Neglecting within-species genetic and phenotypic heterogeneity in microbial stress response may over or underestimate microbial exposure and eventual risk during QMRA.
Collapse
Affiliation(s)
- Patrick Murigu Kamau Njage
- Research Group for Genomic Epidemiology, Division for Global Surveillance, National Food Institute, Technical University of Denmark, 2800 Lyngby, Denmark; (P.L.); (R.S.H.); (T.H.)
- Correspondence: ; Tel.: +45-35-88-75-31
| | - Pimlapas Leekitcharoenphon
- Research Group for Genomic Epidemiology, Division for Global Surveillance, National Food Institute, Technical University of Denmark, 2800 Lyngby, Denmark; (P.L.); (R.S.H.); (T.H.)
| | - Lisbeth Truelstrup Hansen
- Research Group for Microbiology and Hygiene, National Food Institute, Technical University of Denmark, 2800 Lyngby, Denmark;
| | - Rene S. Hendriksen
- Research Group for Genomic Epidemiology, Division for Global Surveillance, National Food Institute, Technical University of Denmark, 2800 Lyngby, Denmark; (P.L.); (R.S.H.); (T.H.)
| | - Christel Faes
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University Katholieke Universiteit Leuven, 3590 Diepenbeek, Belgium; (C.F.); (M.A.)
| | - Marc Aerts
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University Katholieke Universiteit Leuven, 3590 Diepenbeek, Belgium; (C.F.); (M.A.)
| | - Tine Hald
- Research Group for Genomic Epidemiology, Division for Global Surveillance, National Food Institute, Technical University of Denmark, 2800 Lyngby, Denmark; (P.L.); (R.S.H.); (T.H.)
| |
Collapse
|
146
|
Tobias JA, Ottenburghs J, Pigot AL. Avian Diversity: Speciation, Macroevolution, and Ecological Function. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2020. [DOI: 10.1146/annurev-ecolsys-110218-025023] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The origin, distribution, and function of biological diversity are fundamental themes of ecology and evolutionary biology. Research on birds has played a major role in the history and development of these ideas, yet progress was for many decades limited by a focus on patterns of current diversity, often restricted to particular clades or regions. Deeper insight is now emerging from a recent wave of integrative studies combining comprehensive phylogenetic, environmental, and functional trait data at unprecedented scales. We review these empirical advances and describe how they are reshaping our understanding of global patterns of bird diversity and the processes by which it arises, with implications for avian biogeography and functional ecology. Further expansion and integration of data sets may help to resolve longstanding debates about the evolutionary origins of biodiversity and offer a framework for understanding and predicting the response of ecosystems to environmental change.
Collapse
Affiliation(s)
- Joseph A. Tobias
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot SL5 7PY, United Kingdom
| | - Jente Ottenburghs
- Department of Evolutionary Biology, Uppsala University, 752 36 Uppsala, Sweden
| | - Alex L. Pigot
- Centre for Biodiversity and Environment Research, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| |
Collapse
|
147
|
Cortés AJ, Restrepo-Montoya M, Bedoya-Canas LE. Modern Strategies to Assess and Breed Forest Tree Adaptation to Changing Climate. FRONTIERS IN PLANT SCIENCE 2020; 11:583323. [PMID: 33193532 PMCID: PMC7609427 DOI: 10.3389/fpls.2020.583323] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 09/29/2020] [Indexed: 05/02/2023]
Abstract
Studying the genetics of adaptation to new environments in ecologically and industrially important tree species is currently a major research line in the fields of plant science and genetic improvement for tolerance to abiotic stress. Specifically, exploring the genomic basis of local adaptation is imperative for assessing the conditions under which trees will successfully adapt in situ to global climate change. However, this knowledge has scarcely been used in conservation and forest tree improvement because woody perennials face major research limitations such as their outcrossing reproductive systems, long juvenile phase, and huge genome sizes. Therefore, in this review we discuss predictive genomic approaches that promise increasing adaptive selection accuracy and shortening generation intervals. They may also assist the detection of novel allelic variants from tree germplasm, and disclose the genomic potential of adaptation to different environments. For instance, natural populations of tree species invite using tools from the population genomics field to study the signatures of local adaptation. Conventional genetic markers and whole genome sequencing both help identifying genes and markers that diverge between local populations more than expected under neutrality, and that exhibit unique signatures of diversity indicative of "selective sweeps." Ultimately, these efforts inform the conservation and breeding status capable of pivoting forest health, ecosystem services, and sustainable production. Key long-term perspectives include understanding how trees' phylogeographic history may affect the adaptive relevant genetic variation available for adaptation to environmental change. Encouraging "big data" approaches (machine learning-ML) capable of comprehensively merging heterogeneous genomic and ecological datasets is becoming imperative, too.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, Rionegro, Colombia
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| | - Manuela Restrepo-Montoya
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| | - Larry E. Bedoya-Canas
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia – Sede Medellín, Medellín, Colombia
| |
Collapse
|
148
|
Stange M, Barrett RDH, Hendry AP. The importance of genomic variation for biodiversity, ecosystems and people. Nat Rev Genet 2020; 22:89-105. [PMID: 33067582 DOI: 10.1038/s41576-020-00288-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 11/09/2022]
Abstract
The 2019 United Nations Global assessment report on biodiversity and ecosystem services estimated that approximately 1 million species are at risk of extinction. This primarily human-driven loss of biodiversity has unprecedented negative consequences for ecosystems and people. Classic and emerging approaches in genetics and genomics have the potential to dramatically improve these outcomes. In particular, the study of interactions among genetic loci within and between species will play a critical role in understanding the adaptive potential of species and communities, and hence their direct and indirect effects on biodiversity, ecosystems and people. We explore these population and community genomic contexts in the hope of finding solutions for maintaining and improving ecosystem services and nature's contributions to people.
Collapse
Affiliation(s)
- Madlen Stange
- Redpath Museum, McGill University, Montreal, QC, Canada
| | | | | |
Collapse
|
149
|
Statistical and Machine-Learning Analyses in Nutritional Genomics Studies. Nutrients 2020; 12:nu12103140. [PMID: 33066636 PMCID: PMC7602401 DOI: 10.3390/nu12103140] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/08/2020] [Accepted: 10/10/2020] [Indexed: 12/18/2022] Open
Abstract
Nutritional compounds may have an influence on different OMICs levels, including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and metagenomics. The integration of OMICs data is challenging but may provide new knowledge to explain the mechanisms involved in the metabolism of nutrients and diseases. Traditional statistical analyses play an important role in description and data association; however, these statistical procedures are not sufficiently enough powered to interpret the large integrated multiple OMICs (multi-OMICS) datasets. Machine learning (ML) approaches can play a major role in the interpretation of multi-OMICS in nutrition research. Specifically, ML can be used for data mining, sample clustering, and classification to produce predictive models and algorithms for integration of multi-OMICs in response to dietary intake. The objective of this review was to investigate the strategies used for the analysis of multi-OMICs data in nutrition studies. Sixteen recent studies aimed to understand the association between dietary intake and multi-OMICs data are summarized. Multivariate analysis in multi-OMICs nutrition studies is used more commonly for analyses. Overall, as nutrition research incorporated multi-OMICs data, the use of novel approaches of analysis such as ML needs to complement the traditional statistical analyses to fully explain the impact of nutrition on health and disease.
Collapse
|
150
|
Góralska M, Bińkowski J, Lenarczyk N, Bienias A, Grądzielewska A, Czyczyło-Mysza I, Kapłoniak K, Stojałowski S, Myśków B. How Machine Learning Methods Helped Find Putative Rye Wax Genes Among GBS Data. Int J Mol Sci 2020; 21:E7501. [PMID: 33053706 PMCID: PMC7593958 DOI: 10.3390/ijms21207501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 09/23/2020] [Accepted: 10/07/2020] [Indexed: 11/17/2022] Open
Abstract
The standard approach to genetic mapping was supplemented by machine learning (ML) to establish the location of the rye gene associated with epicuticular wax formation (glaucous phenotype). Over 180 plants of the biparental F2 population were genotyped with the DArTseq (sequencing-based diversity array technology). A maximum likelihood (MLH) algorithm (JoinMap 5.0) and three ML algorithms: logistic regression (LR), random forest and extreme gradient boosted trees (XGBoost), were used to select markers closely linked to the gene encoding wax layer. The allele conditioning the nonglaucous appearance of plants, derived from the cultivar Karlikovaja Zelenostebelnaja, was mapped at the chromosome 2R, which is the first report on this localization. The DNA sequence of DArT-Silico 3585843, closely linked to wax segregation detected by using ML methods, was indicated as one of the candidates controlling the studied trait. The putative gene encodes the ABCG11 transporter.
Collapse
Affiliation(s)
- Magdalena Góralska
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Jan Bińkowski
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Natalia Lenarczyk
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Anna Bienias
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Agnieszka Grądzielewska
- Institute of Plant Genetics, Breeding and Biotechnology, University of Life Sciences in Lublin, ul. Akademicka, 20–950 Lublin, Poland;
| | - Ilona Czyczyło-Mysza
- Polish Academy of Sciences, The Franciszek Górski Institute of Plant Physiology, Niezapominajek 21, 30–239 Kraków, Poland; (I.C.-M.); (K.K.)
| | - Kamila Kapłoniak
- Polish Academy of Sciences, The Franciszek Górski Institute of Plant Physiology, Niezapominajek 21, 30–239 Kraków, Poland; (I.C.-M.); (K.K.)
| | - Stefan Stojałowski
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Beata Myśków
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| |
Collapse
|