1
|
Bi X, Qiu M, Li D, Zhang Y, Zhan W, Wang Z, Lv Z, Li H, Chen G. Transcriptomic and metabolomic analysis of the mechanisms underlying stress responses of the freshwater snail, Pomacea canaliculata, exposed to different levels of arsenic. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2024; 267:106835. [PMID: 38219501 DOI: 10.1016/j.aquatox.2024.106835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/12/2023] [Accepted: 01/09/2024] [Indexed: 01/16/2024]
Abstract
Arsenic (As) pollution poses an important problem, but limited information is available about the physiological effects of As on freshwater invertebrates. Here, we investigated the physiological effects of chronic As exposure on Pomacea canaliculata, a freshwater invertebrate. High level of As (Ⅲ, 5 mg/L) inhibited the growth of P. canaliculata, whereas low level of As (Ⅲ, 2 mg/L) promoted growth. Pathological changes in shell and cellular ultrastructure due to As accumulation likely explain the growth inhibition at high As level. Low level of As simulated the expression of genes related to DNA replication and chitosan biosynthesis, potentially accounting for the growth promotion observed. High level of As enrichment pathways primarily involved cytochrome P450, glutathione, and arachidonic acid-mediated metabolism of xenobiotics. ATP-binding cassette (ABC) transporters, specifically the ABCB and ABCC subfamilies, were involved in As transport. Differential metabolites were mainly associated with the metabolism and biosynthesis of amino acids. These findings elucidate the dose-dependent effects of As stress on P. canaliculata growth, with low levels promoting and high levels inhibiting. Additionally, our findings also provide insights into As metabolism and transport in P. canaliculata.
Collapse
Affiliation(s)
- Xiaoyang Bi
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
| | - Mingxin Qiu
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
| | - Danni Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
| | - Yujing Zhang
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
| | - Wenhui Zhan
- Guangdong Testing Institute of Product Quality Supervision, Foshan 528300, China
| | - Zhixiong Wang
- Guangdong Testing Institute of Product Quality Supervision, Foshan 528300, China
| | - Zhaowei Lv
- Guangdong Testing Institute of Product Quality Supervision, Foshan 528300, China
| | - Huashou Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
| | - Guikui Chen
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Provincial Key Laboratory of Agricultural & Rural Pollution Abatement and Environmental Safety, College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China.
| |
Collapse
|
2
|
Feser M, König P, Fiebig A, Arend D, Lange M, Scholz U. On the way to plant data commons - a genotyping use case. J Integr Bioinform 2022; 19:jib-2022-0033. [PMID: 36065132 PMCID: PMC9800039 DOI: 10.1515/jib-2022-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/04/2022] [Accepted: 08/11/2022] [Indexed: 01/09/2023] Open
Abstract
Over the last years it has been observed that the progress in data collection in life science has created increasing demand and opportunities for advanced bioinformatics. This includes data management as well as the individual data analysis and often covers the entire data life cycle. A variety of tools have been developed to store, share, or reuse the data produced in the different domains such as genotyping. Especially imputation, as a subfield of genotyping, requires good Research Data Management (RDM) strategies to enable use and re-use of genotypic data. To aim for sustainable software, it is necessary to develop tools and surrounding ecosystems, which are reusable and maintainable. Reusability in the context of streamlined tools can e.g. be achieved by standardizing the input and output of the different tools and adapting to open and broadly used file formats. By using such established file formats, the tools can also be connected with others, improving the overall interoperability of the software. Finally, it is important to build strong communities that maintain the tools by developing and contributing new features and maintenance updates. In this article, concepts for this will be presented for an imputation service.
Collapse
Affiliation(s)
- Manuel Feser
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| |
Collapse
|
3
|
Diakou I, Papakonstantinou E, Papageorgiou L, Pierouli K, Dragoumani K, Spandidos DA, Bacopoulou F, Chrousos GP, Goulielmos GΝ, Eliopoulos E, Vlachakis D. Multiple sclerosis and computational biology (Review). Biomed Rep 2022; 17:96. [PMID: 36382258 PMCID: PMC9634047 DOI: 10.3892/br.2022.1579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/27/2022] [Indexed: 12/02/2022] Open
Abstract
Multiple sclerosis (MS) is an autoimmune neurodegenerative disease whose prevalence has increased worldwide. The resultant symptoms may be debilitating and can substantially reduce the of patients. Computational biology, which involves the use of computational tools to answer biomedical questions, may provide the basis for novel healthcare approaches in the context of MS. The rapid accumulation of health data, and the ever-increasing computational power and evolving technology have helped to modernize and refine MS research. From the discovery of novel biomarkers to the optimization of treatment and a number of quality-of-life enhancements for patients, computational biology methods and tools are shaping the field of MS diagnosis, management and treatment. The final goal in such a complex disease would be personalized medicine, i.e., providing healthcare services that are tailored to the individual patient, in accordance to the particular biology of their disease and the environmental factors to which they are subjected. The present review article summarizes the current knowledge on MS, modern computational biology and the impact of modern computational approaches of MS.
Collapse
Affiliation(s)
- Io Diakou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Eleni Papakonstantinou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Louis Papageorgiou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Katerina Pierouli
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Konstantina Dragoumani
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Demetrios A. Spandidos
- Laboratory of Clinical Virology, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - George P. Chrousos
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - Georges Ν. Goulielmos
- Section of Molecular Pathology and Human Genetics, Department of Internal Medicine, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Elias Eliopoulos
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of The Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
4
|
Yan M, Nie H, Wang Y, Wang X, Jarret R, Zhao J, Wang H, Yang J. Exploring and exploiting genetics and genomics for sweetpotato improvement: Status and perspectives. PLANT COMMUNICATIONS 2022; 3:100332. [PMID: 35643086 PMCID: PMC9482988 DOI: 10.1016/j.xplc.2022.100332] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 04/17/2022] [Accepted: 05/02/2022] [Indexed: 05/14/2023]
Abstract
Sweetpotato (Ipomoea batatas (L.) Lam.) is one of the most important root crops cultivated worldwide. Because of its adaptability, high yield potential, and nutritional value, sweetpotato has become an important food crop, particularly in developing countries. To ensure adequate crop yields to meet increasing demand, it is essential to enhance the tolerance of sweetpotato to environmental stresses and other yield-limiting factors. The highly heterozygous hexaploid genome of I. batatas complicates genetic studies and limits improvement of sweetpotato through traditional breeding. However, application of next-generation sequencing and high-throughput genotyping and phenotyping technologies to sweetpotato genetics and genomics research has provided new tools and resources for crop improvement. In this review, we discuss the genomics resources that are available for sweetpotato, including the current reference genome, databases, and available bioinformatics tools. We systematically review the current state of knowledge on the polyploid genetics of sweetpotato, including studies of its origin and germplasm diversity and the associated mapping of important agricultural traits. We then outline the conventional and molecular breeding approaches that have been applied to sweetpotato. Finally, we discuss future goals for genetic studies of sweetpotato and crop improvement via breeding in combination with state-of-the-art multi-omics approaches such as genomic selection and gene editing. These approaches will advance and accelerate genetic improvement of this important root crop and facilitate its sustainable global production.
Collapse
Affiliation(s)
- Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Haozhen Nie
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Yunze Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Xinyi Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | | | - Jiamin Zhao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Hongxia Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
| |
Collapse
|
5
|
|
6
|
Combining metabolome and clinical indicators with machine learning provides some promising diagnostic markers to precisely detect smear-positive/negative pulmonary tuberculosis. BMC Infect Dis 2022; 22:707. [PMID: 36008772 PMCID: PMC9403968 DOI: 10.1186/s12879-022-07694-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 08/22/2022] [Indexed: 11/30/2022] Open
Abstract
Background Tuberculosis (TB) had been the leading lethal infectious disease worldwide for a long time (2014–2019) until the COVID-19 global pandemic, and it is still one of the top 10 death causes worldwide. One important reason why there are so many TB patients and death cases in the world is because of the difficulties in precise diagnosis of TB using common detection methods, especially for some smear-negative pulmonary tuberculosis (SNPT) cases. The rapid development of metabolome and machine learning offers a great opportunity for precision diagnosis of TB. However, the metabolite biomarkers for the precision diagnosis of smear-positive and smear-negative pulmonary tuberculosis (SPPT/SNPT) remain to be uncovered. In this study, we combined metabolomics and clinical indicators with machine learning to screen out newly diagnostic biomarkers for the precise identification of SPPT and SNPT patients. Methods Untargeted plasma metabolomic profiling was performed for 27 SPPT patients, 37 SNPT patients and controls. The orthogonal partial least squares-discriminant analysis (OPLS-DA) was then conducted to screen differential metabolites among the three groups. Metabolite enriched pathways, random forest (RF), support vector machines (SVM) and multilayer perceptron neural network (MLP) were performed using Metaboanalyst 5.0, “caret” R package, “e1071” R package and “Tensorflow” Python package, respectively. Results Metabolomic analysis revealed significant enrichment of fatty acid and amino acid metabolites in the plasma of SPPT and SNPT patients, where SPPT samples showed a more serious dysfunction in fatty acid and amino acid metabolisms. Further RF analysis revealed four optimized diagnostic biomarker combinations including ten features (two lipid/lipid-like molecules and seven organic acids/derivatives, and one clinical indicator) for the identification of SPPT, SNPT patients and controls with high accuracy (83–93%), which were further verified by SVM and MLP. Among them, MLP displayed the best classification performance on simultaneously precise identification of the three groups (94.74%), suggesting the advantage of MLP over RF/SVM to some extent. Conclusions Our findings reveal plasma metabolomic characteristics of SPPT and SNPT patients, provide some novel promising diagnostic markers for precision diagnosis of various types of TB, and show the potential of machine learning in screening out biomarkers from big data. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07694-8.
Collapse
|
7
|
Adams DC, Collyer ML. Consilience of methods for phylogenetic analysis of variance. Evolution 2022; 76:1406-1419. [PMID: 35522593 PMCID: PMC9544334 DOI: 10.1111/evo.14512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 03/22/2022] [Indexed: 01/21/2023]
Abstract
Simulation-based and permutation-based inferential methods are commonplace in phylogenetic comparative methods, especially as evolutionary data have become more complex and parametric methods more limited for their analysis. Both approaches simulate many random outcomes from a null model to empirically generate sampling distributions of statistics. Although simulation-based and permutation-based methods seem commensurate in purpose, results from analysis of variance (ANOVA) based on the distributions of random F-statistics produced by these methods can be quite different in practice. Differences could be from either the null-model process that generates variation across many simulations or random permutations of the data, or different estimation methods for linear model coefficients and statistics. Unfortunately, because the null-model process and coefficient estimation are intrinsically linked in phylogenetic ANOVA methods, the precise reason for methodological differences has not been fully considered. Here we show that the null-model processes of phylogenetic simulation and randomizing residuals in a permutation procedure are indeed commensurate, and that both also produce results consistent with parametric ANOVA, for cases where parametric ANOVA is possible. We also provide results that caution against using ordinary least-squares estimation along with phylogenetic simulation; a typical phylogenetic ANOVA implementation.
Collapse
Affiliation(s)
- Dean C. Adams
- Department of Ecology, Evolution, and Organismal BiologyIowa State UniversityAmesIowaUSA
| | | |
Collapse
|
8
|
Arend D, Psaroudakis D, Memon JA, Rey-Mazón E, Schüler D, Szymanski JJ, Scholz U, Junker A, Lange M. From data to knowledge - big data needs stewardship, a plant phenomics perspective. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:335-347. [PMID: 35535481 DOI: 10.1111/tpj.15804] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 06/14/2023]
Abstract
The research data life cycle from project planning to data publishing is an integral part of current research. Until the last decade, researchers were responsible for all associated phases in addition to the actual research and were assisted only at certain points by IT or bioinformaticians. Starting with advances in sequencing, the automation of analytical methods in all life science fields, including in plant phenotyping, has led to ever-increasing amounts of ever more complex data. The tasks associated with these challenges now often exceed the expertise of and infrastructure available to scientists, leading to an increased risk of data loss over time. The IPK Gatersleben has one of the world's largest germplasm collections and two decades of experience in crop plant research data management. In this article we show how challenges in modern, data-driven research can be addressed by data stewards. Based on concrete use cases, data management processes and best practices from plant phenotyping, we describe which expertise and skills are required and how data stewards as an integral actor can enhance the quality of a necessary digital transformation in progressive research.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Dennis Psaroudakis
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Junaid Altaf Memon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Elena Rey-Mazón
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Jedrzej Jakub Szymanski
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| |
Collapse
|
9
|
Youn J, Rai N, Tagkopoulos I. Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes. Nat Commun 2022; 13:2360. [PMID: 35487919 PMCID: PMC9055065 DOI: 10.1038/s41467-022-29993-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 03/04/2022] [Indexed: 11/09/2022] Open
Abstract
We present a machine learning framework to automate knowledge discovery through knowledge graph construction, inconsistency resolution, and iterative link prediction. By incorporating knowledge from 10 publicly available sources, we construct an Escherichia coli antibiotic resistance knowledge graph with 651,758 triples from 23 triple types after resolving 236 sets of inconsistencies. Iteratively applying link prediction to this graph and wet-lab validation of the generated hypotheses reveal 15 antibiotic resistant E. coli genes, with 6 of them never associated with antibiotic resistance for any microbe. Iterative link prediction leads to a performance improvement and more findings. The probability of positive findings highly correlates with experimentally validated findings (R2 = 0.94). We also identify 5 homologs in Salmonella enterica that are all validated to confer resistance to antibiotics. This work demonstrates how evidence-driven decisions are a step toward automating knowledge discovery with high confidence and accelerated pace, thereby substituting traditional time-consuming and expensive methods.
Collapse
Affiliation(s)
- Jason Youn
- Department of Computer Science, University of California, Davis, CA, 95616, USA
- Genome Center, University of California, Davis, CA, 95616, USA
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Department of Computer Science, University of California, Davis, CA, 95616, USA
- Genome Center, University of California, Davis, CA, 95616, USA
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Department of Computer Science, University of California, Davis, CA, 95616, USA.
- Genome Center, University of California, Davis, CA, 95616, USA.
- USDA/NSF AI Institute for Next Generation Food Systems (AIFS), University of California, Davis, CA, 95616, USA.
| |
Collapse
|
10
|
Monteiro HS, Leifer I, Reis SDS, Andrade JS, Makse HA. Fast algorithm to identify minimal patterns of synchrony through fibration symmetries in large directed networks. CHAOS (WOODBURY, N.Y.) 2022; 32:033120. [PMID: 35364841 PMCID: PMC8933057 DOI: 10.1063/5.0066741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 02/24/2022] [Indexed: 06/14/2023]
Abstract
Recent studies have revealed the interplay between the structure of network circuits with fibration symmetries and the functionality of biological networks within which they have been identified. The presence of these symmetries in complex networks predicts the phenomenon of cluster synchronization, which produces patterns of a synchronized group of nodes. Here, we present a fast, and memory efficient, algorithm to identify fibration symmetries in networks. The algorithm is particularly suitable for large networks since it has a runtime of complexity O(MlogN) and requires O(M+N) of memory resources, where N and M are the number of nodes and edges in the network, respectively. The algorithm is a modification of the so-called refinement paradigm to identify circuits that are symmetrical to information flow (i.e., fibers) by finding the coarsest refinement partition over the network. Finally, we show that the algorithm provides an optimal procedure for identifying fibers, overcoming current approaches used in the literature.
Collapse
Affiliation(s)
- Higor S. Monteiro
- Departamento de Física, Universidade Federal do Ceará, Fortaleza, Ceará 60451-970, Brazil
| | - Ian Leifer
- Levich Institute and Physics Department, The City College of New York, New York, New York 10031, USA
| | - Saulo D. S. Reis
- Departamento de Física, Universidade Federal do Ceará, Fortaleza, Ceará 60451-970, Brazil
| | - José S. Andrade
- Departamento de Física, Universidade Federal do Ceará, Fortaleza, Ceará 60451-970, Brazil
| | - Hernan A. Makse
- Levich Institute and Physics Department, The City College of New York, New York, New York 10031, USA
| |
Collapse
|
11
|
Sankara Narayanan P, Runthala A. Accurate computational evolution of proteins and its dependence on deep learning and machine learning strategies. BIOCATAL BIOTRANSFOR 2022. [DOI: 10.1080/10242422.2022.2030317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
12
|
Silva-Costa LC, Smith BJ. Post-translational Modifications in Brain Diseases: A Future for Biomarkers. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1382:129-141. [DOI: 10.1007/978-3-031-05460-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Chen J, Guo Y, Huang S, Zhan H, Zhang M, Wang J, Shu Y. Integration of transcriptome and proteome reveals molecular mechanisms underlying stress responses of the cutworm, Spodoptera litura, exposed to different levels of lead (Pb). CHEMOSPHERE 2021; 283:131205. [PMID: 34147986 DOI: 10.1016/j.chemosphere.2021.131205] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 06/08/2021] [Accepted: 06/09/2021] [Indexed: 06/12/2023]
Abstract
Heavy metals are major environmental pollutants that affect organisms across different trophic levels. Herbivorous insects play an important role in the bioaccumulation, and eventually, biomagnification of these metals. Although effects of heavy metal stress on insects have been well-studied, the molecular mechanisms underlying their effects remain poorly understood. Here, we used the RNA-Seq profiling and isobaric tags for relative and absolute quantitation (iTRAQ) approaches to unravel these mechanisms in the polyphagous pest Spodoptera litura exposed to lead (Pb) at two different concentrations (12.5 and 100 mg Pb/kg; PbL and PbH, respectively). Altogether, 1392 and 1630 differentially expressed genes (DEGs) and 58, 114 differentially expressed proteins (DEPs) were identified in larvae exposed to PbL and PbH, respectively. After exposed to PbL, the main up-regulated genes clusters and proteins in S. litura larvae were associated with their metabolic processes, including carbohydrate, protein, and lipid metabolism, but the levels of cytochrome P450 associated with the pathway of xenobiotic biodegradation and metabolism were found to be decreased. In contrast, the main up-regulated genes clusters and proteins in larvae exposed to PbH were enriched in the metabolism of xenobiotic by cytochrome P450, drug metabolism-cytochrome P450, and other drug metabolism enzymes, while the down-regulated genes and proteins were found to be closely related to the lipid (lipase) and protein (serine protease, trypsin) metabolism and growth processes (cuticular protein). These findings indicate that S. litura larvae exposed to PbL could enhance food digestion and absorption to prioritize for growth rather than detoxification, whereas S. litura larvae exposed to PbH reduced food digestion and absorption and channelized the limited energy for detoxification rather than growth. These contrasting results explain the dose-dependent effects of heavy metal stress on insect life-history traits, wherein low levels of heavy metal stress induce stimulation, while high levels of heavy metal stress cause inhibition at the transcriptome and proteome levels.
Collapse
Affiliation(s)
- Jin Chen
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Yeshan Guo
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Shimin Huang
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Huiru Zhan
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Meifang Zhang
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Jianwu Wang
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China.
| | - Yinghua Shu
- Key Laboratory of Agro-Environment in the Tropics, Ministry of Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Provincial Key Laboratory of Eco-Circular Agriculture, South China Agricultural University, Guangzhou, 510642, China; Guangdong Engineering Research Centre for Modern Eco-agriculture, Guangzhou, 510642, China; Department of Ecology, College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
14
|
Vitorino R, Choudhury M, Guedes S, Ferreira R, Thongboonkerd V, Sharma L, Amado F, Srivastava S. Peptidomics and proteogenomics: background, challenges and future needs. Expert Rev Proteomics 2021; 18:643-659. [PMID: 34517741 DOI: 10.1080/14789450.2021.1980388] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development. AREAS COVERED Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date. EXPERT OPINION This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Collapse
Affiliation(s)
- Rui Vitorino
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal.,Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manisha Choudhury
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| | - Sofia Guedes
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rita Ferreira
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | - Francisco Amado
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| |
Collapse
|
15
|
Pegg TJ, Gladish DK, Baker RL. Algae to angiosperms: Autofluorescence for rapid visualization of plant anatomy among diverse taxa. APPLICATIONS IN PLANT SCIENCES 2021; 9:e11437. [PMID: 34268017 PMCID: PMC8272585 DOI: 10.1002/aps3.11437] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 05/19/2021] [Indexed: 05/22/2023]
Abstract
PREMISE Fluorescence microscopy is an effective tool for viewing plant internal anatomy. However, using fluorescent antibodies or labels hinders throughput. We present a minimal protocol that takes advantage of inherent autofluorescence and aldehyde-induced fluorescence in plant cellular and subcellular structures to markedly increase throughput in cellular and ultrastructural visualization. METHODS AND RESULTS Twelve species distributed across the plant phylogeny were each subjected to five fixative treatments: 1% paraformaldehyde and 2% glutaraldehyde, 2% paraformaldehyde, 2% glutaraldehyde, formalin-acid-alcohol (FAA), and 70% ethanol. Samples were prepared by embedding and mechanically sectioning or via whole mount. A confocal laser scanning system was used to collect micrographs. We evaluated and compared fixative influence on sample structural preservation and tissue autofluorescence. CONCLUSIONS Formaldehyde fixation of Viridiplantae taxa samples generates useful structural data while requiring no additional histological staining or clearing. In addition, a fluorescence-capable microscope is the only specialized equipment required for image acquisition. The minimal protocol developed in this experiment enables high-throughput sample processing by eliminating the need for multi-day preparations.
Collapse
Affiliation(s)
- Timothy J. Pegg
- Department of BiologyMiami UniversityOxfordOhio45056USA
- Graduate Program in BotanyMiami UniversityOxfordOhio45056USA
| | - Daniel K. Gladish
- Department of BiologyMiami UniversityOxfordOhio45056USA
- Graduate Program in BotanyMiami UniversityOxfordOhio45056USA
| | - Robert L. Baker
- Department of BiologyMiami UniversityOxfordOhio45056USA
- Graduate Program in BotanyMiami UniversityOxfordOhio45056USA
| |
Collapse
|
16
|
Kang DS, Kim HS, Jung JH, Lee CM, Ahn YS, Seo YR. Formaldehyde exposure and leukemia risk: a comprehensive review and network-based toxicogenomic approach. Genes Environ 2021; 43:13. [PMID: 33845901 PMCID: PMC8042688 DOI: 10.1186/s41021-021-00183-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 03/19/2021] [Indexed: 12/20/2022] Open
Abstract
Formaldehyde is a widely used but highly reactive and toxic chemical. The International Agency for Research on Cancer classifies formaldehyde as a Group 1 carcinogen, based on nasopharyngeal cancer and leukemia studies. However, the correlation between formaldehyde exposure and leukemia incidence is a controversial issue. To understand the association between formaldehyde exposure and leukemia, we explored biological networks based on formaldehyde-related genes retrieved from public and commercial databases. Through the literature-based network approach, we summarized qualitative associations between formaldehyde exposure and leukemia. Our results indicate that oxidative stress-mediated genetic changes induced by formaldehyde could disturb the hematopoietic system, possibly leading to leukemia. Furthermore, we suggested major genes that are thought to be affected by formaldehyde exposure and associated with leukemia development. Our suggestions can be used to complement experimental data for understanding and identifying the leukemogenic mechanism of formaldehyde.
Collapse
Affiliation(s)
- Doo Seok Kang
- Department of Life Science, Institute of Environmental Medicine for Green Chemistry, Dongguk University Biomedi Campus, 32 Dongguk-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do, 10326, Republic of Korea
| | - Hyun Soo Kim
- Department of Life Science, Institute of Environmental Medicine for Green Chemistry, Dongguk University Biomedi Campus, 32 Dongguk-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do, 10326, Republic of Korea
| | - Jong-Hyeon Jung
- Faculty of Health Science, Daegu Haany University, Gyeongsan, Gyeongbuk, 38610, Republic of Korea
| | - Cheol Min Lee
- Department of Chemical and Biological Engineering, College of Natural Science and Engineering, Seokyeong University, Seoul, 02173, Republic of Korea
| | - Yeon-Soon Ahn
- Department of Preventive Medicine and Institute of Occupational and Environmental Medicine, Wonju College of Medicine, Yonsei University, Wonju, Gangwon, 26426, Republic of Korea
| | - Young Rok Seo
- Department of Life Science, Institute of Environmental Medicine for Green Chemistry, Dongguk University Biomedi Campus, 32 Dongguk-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do, 10326, Republic of Korea.
| |
Collapse
|
17
|
JSOM: Jointly-evolving self-organizing maps for alignment of biological datasets and identification of related clusters. PLoS Comput Biol 2021; 17:e1008804. [PMID: 33724985 PMCID: PMC7963045 DOI: 10.1371/journal.pcbi.1008804] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 02/15/2021] [Indexed: 11/19/2022] Open
Abstract
With the rapid advances of various single-cell technologies, an increasing number of single-cell datasets are being generated, and the computational tools for aligning the datasets which make subsequent integration or meta-analysis possible have become critical. Typically, single-cell datasets from different technologies cannot be directly combined or concatenated, due to the innate difference in the data, such as the number of measured parameters and the distributions. Even datasets generated by the same technology are often affected by the batch effect. A computational approach for aligning different datasets and hence identifying related clusters will be useful for data integration and interpretation in large scale single-cell experiments. Our proposed algorithm called JSOM, a variation of the Self-organizing map, aligns two related datasets that contain similar clusters, by constructing two maps—low-dimensional discretized representation of datasets–that jointly evolve according to both datasets. Here we applied the JSOM algorithm to flow cytometry, mass cytometry, and single-cell RNA sequencing datasets. The resulting JSOM maps not only align the related clusters in the two datasets but also preserve the topology of the datasets so that the maps could be used for further analysis, such as clustering. Biological datasets are now generated more than ever as many data acquisition technologies have been developed over the years, especially single-cell technologies. With increasing amounts of datasets available for larger scale studies, robust computational tools that could align datasets are needed for data integration and interpretation. We present a new algorithm that can align two biological datasets and demonstrated that the algorithm can work with data generated from different data acquisition technologies. Our proposed algorithm produces low dimensional representations of two datasets to align them in a way that preserves the topology of the respective datasets. Such aligned maps facilitate further analysis, such as clustering. The proposed algorithm showed promising results when applied to different combinations of datasets, i.e., flow cytometry to flow cytometry, flow cytometry to mass cytometry, and two different single-cell RNA sequencing technologies. Therefore, our newly developed algorithm could potentially lead to new discoveries that were once difficult to obtain.
Collapse
|
18
|
Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep Learning in Mining Biological Data. Cognit Comput 2021; 13:1-33. [PMID: 33425045 PMCID: PMC7783296 DOI: 10.1007/s12559-020-09773-x] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/28/2020] [Indexed: 02/06/2023]
Abstract
Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures-known as deep learning (DL)-have been successfully applied to solve many complex pattern recognition problems. To investigate how DL-especially its different architectures-has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures' applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.
Collapse
Affiliation(s)
- Mufti Mahmud
- Department of Computer Science, Nottingham Trent University, Clifton, NG11 8NS Nottingham, UK
- Medical Technology Innovation Facility, Nottingham Trent University, NG11 8NS Clifton, Nottingham, UK
| | - M. Shamim Kaiser
- Institute of Information Technology, Jahangirnagar University, Savar 1342 Dhaka, Bangladesh
| | - T. Martin McGinnity
- Department of Computer Science, Nottingham Trent University, Clifton, NG11 8NS Nottingham, UK
- Intelligent Systems Research Centre, Ulster University, Northern Ireland BT48 7JL Derry, UK
| | - Amir Hussain
- School of Computing , Edinburgh, EH11 4BN Edinburgh, UK
| |
Collapse
|
19
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
20
|
Multi-assignment clustering: Machine learning from a biological perspective. J Biotechnol 2020; 326:1-10. [PMID: 33285150 DOI: 10.1016/j.jbiotec.2020.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 12/03/2020] [Indexed: 11/21/2022]
Abstract
A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).
Collapse
|
21
|
Scott JK, Breden F. The adaptive immune receptor repertoire community as a model for FAIR stewardship of big immunology data. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 24:71-77. [PMID: 33073065 PMCID: PMC7547575 DOI: 10.1016/j.coisb.2020.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Systems biology involves network-oriented, computational approaches to modeling biological systems through analysis of big biological data. To contribute maximally to scientific progress, big biological data should be FAIR: findable, accessible, interoperable, and reusable. Here, we describe high-throughput sequencing data that characterize the vast diversity of B- and T-cell clones comprising the adaptive immune receptor repertoire (AIRR-seq data) and its contribution to our understanding of COVID-19 (coronavirus disease 19). We describe the accomplishments of the AIRR community, a grass-roots network of interdisciplinary laboratory scientists, bioinformaticians, and policy wonks, in creating and publishing standards, software and repositories for AIRR-seq data based on the FAIR principles.
Collapse
Affiliation(s)
- Jamie K Scott
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| |
Collapse
|
22
|
Lung PY, Zhong D, Pang X, Li Y, Zhang J. Maximizing the reusability of gene expression data by predicting missing metadata. PLoS Comput Biol 2020; 16:e1007450. [PMID: 33156882 PMCID: PMC7673503 DOI: 10.1371/journal.pcbi.1007450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 11/18/2020] [Accepted: 10/09/2020] [Indexed: 11/18/2022] Open
Abstract
Reusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we developed a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We found that when using predicted data to conduct other analyses, it is not optimal to use all the predicted data. Instead, one should only use the subset of data, which can be predicted accurately. We proposed a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we showed that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.
Collapse
Affiliation(s)
- Pei-Yau Lung
- Department of Statistics, Florida State University, Tallahassee, United States of America
| | - Dongrui Zhong
- Department of Statistics, Florida State University, Tallahassee, United States of America
| | - Xiaodong Pang
- Insilicom LLC, Tallahassee, United States of America
| | - Yan Li
- Department of Breast Surgery, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, United States of America
- * E-mail:
| |
Collapse
|
23
|
Kruchten AE. A Curricular Bioinformatics Approach to Teaching Undergraduates to Analyze Metagenomic Datasets Using R. Front Microbiol 2020; 11:578600. [PMID: 33013816 PMCID: PMC7511545 DOI: 10.3389/fmicb.2020.578600] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 08/12/2020] [Indexed: 01/06/2023] Open
Abstract
Biologists with bioinformatic skills will be better prepared for the job market, but relatively few biology programs require bioinformatics courses. Inclusion in the curriculum may be hindered by several barriers, including lack of faculty expertise, student resistance to computational work, and few examples in the pedagogical literature. An 8-week wet-lab and in silico research experience for undergraduates was implemented. Students performed DNA purification and metagenomics analysis to compare the diversity and abundance of microbes in two samples. Students sampled snow from sites in northern Minnesota and purified genomic DNA from the microbes, followed by metagenomic analysis. Students used an existing metagenomic dataset to practice analysis skills, including comparing the use of Excel versus R for analysis and visualization of a large dataset. Upon receipt of the snow data, students applied their recently acquired skills to their new dataset and reported their results via a poster. Several outcomes were achieved as a result of this module. First, YouTube videos demonstrating hands-on metagenomics and R techniques were used as professional development for faculty, leading to broadened research capabilities and comfort with bioinformatics. Second, students were introduced to computational skills in a manner that was intentional, with time for both introduction and reinforcement of skills. Finally, the module was effectively included in a biology curriculum because it could function as either a stand-alone course or a module within another course such as microbiology. This module, developed with Course-based Undergraduate Research Experience guidelines in mind, introduces students and faculty to bioinformatics in biology research.
Collapse
Affiliation(s)
- Anne E Kruchten
- Department of Biology, The College of St. Scholastica, Duluth, MN, United States
| |
Collapse
|
24
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
25
|
Abstract
Abstract
Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. This can be achieved by leveraging omics information for accurate molecular characterization of tumors. Tumor tissue biopsies are currently the main source of information for molecular profiling. However, biopsies are invasive and limited in resolving spatiotemporal heterogeneity in tumor tissues. Alternative non-invasive liquid biopsies can exploit patient’s body fluids to access multiple layers of tumor-specific biological information (genomes, epigenomes, transcriptomes, proteomes, metabolomes, circulating tumor cells, and exosomes). Analysis and integration of these large and diverse datasets using statistical and machine learning approaches can yield important insights into tumor biology and lead to discovery of new diagnostic, predictive, and prognostic biomarkers. Translation of these new diagnostic tools into standard clinical practice could transform oncology, as demonstrated by a number of liquid biopsy assays already entering clinical use. In this review, we highlight successes and challenges facing the rapidly evolving field of cancer biomarker research.
Lay Summary
Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. The discovery of biomarkers for precision oncology has been accelerated by high-throughput experimental and computational methods, which can inform fine-grained characterization of tumors for clinical decision-making. Moreover, advances in the liquid biopsy field allow non-invasive sampling of patient’s body fluids with the aim of analyzing circulating biomarkers, obviating the need for invasive tumor tissue biopsies. In this review, we highlight successes and challenges facing the rapidly evolving field of liquid biopsy cancer biomarker research.
Collapse
|
26
|
Mabvakure BM, Rott R, Dobrowsky L, Van Heusden P, Morris L, Scheepers C, Moore PL. Advancing HIV Vaccine Research With Low-Cost High-Performance Computing Infrastructure: An Alternative Approach for Resource-Limited Settings. Bioinform Biol Insights 2019; 13:1177932219882347. [PMID: 35173421 PMCID: PMC8842485 DOI: 10.1177/1177932219882347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 09/21/2019] [Indexed: 11/17/2022] Open
Abstract
Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templates at a time. However, these large datasets can only be efficiently explored using bioinformatics analyses requiring huge data storage and computational resources adapted for high-performance processing. High-performance computing allows for efficient handling of large data and tasks that may require multi-threading and prolonged computational times, which is not feasible with ordinary computers. However, high-performance computing resources are costly and therefore not always readily available in low-income settings. We describe the establishment of an affordable high-performance computing bioinformatics cluster consisting of 3 nodes, constructed using ordinary desktop computers and open-source software including Linux Fedora, SLURM Workload Manager, and the Conda package manager. For the analysis of large antibody sequence datasets and for complex viral phylodynamic analyses, the cluster out-performed desktop computers. This has demonstrated that it is possible to construct high-performance computing capacity capable of analyzing large NGS data from relatively low-cost hardware and entirely free (open-source) software, even in resource-limited settings. Such a cluster design has broad utility beyond bioinformatics to other studies that require high-performance computing.
Collapse
Affiliation(s)
- Batsirai M Mabvakure
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Division of Transfusion Medicine, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | | | - Peter Van Heusden
- South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - Lynn Morris
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Penny L Moore
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
27
|
Simões T, Novais SC, Natal-da-Luz T, Devreese B, de Boer T, Roelofs D, Sousa JP, van Straalen NM, Lemos MFL. Using time-lapse omics correlations to integrate toxicological pathways of a formulated fungicide in a soil invertebrate. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2019; 246:845-854. [PMID: 30623841 DOI: 10.1016/j.envpol.2018.12.069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 12/18/2018] [Accepted: 12/22/2018] [Indexed: 06/09/2023]
Abstract
The use of an integrative molecular approach can actively improve the evaluation of environmental health status and impact of chemicals, providing the knowledge to develop sentinel tools that can be integrated in risk assessment studies, since gene and protein expressions represent the first response barriers to anthropogenic stress. This work aimed to determine the mechanisms of toxic action of a widely applied fungicide formulation (chlorothalonil), following a time series approach and using a soil model arthropod, Folsomia candida. To link effects at different levels of biological organization, data were collected on reproduction, gene expression and protein levels, in a time series during exposure to a natural soil. Results showed a mechanistic mode of action for chlorothalonil, affecting pathways of detoxification and excretion, immune response, cellular respiration, protein metabolism and oxidative stress defense, causing irregular cell signaling (JNK and NOD ½ pathways), DNA damage and abnormal cell proliferation, leading to impairment in developmental features such as molting cycle and reproduction. The omics datasets presented highly significant positive correlations between the gene expression levels at a certain time-point and the corresponding protein products 2-3 days later. The integrated omics in this study has provided useful insights into pesticide mechanisms of toxicity, evidencing the relevance of such analyses in toxicological studies, and highlighting the importance of considering a time-series when integrating these datasets.
Collapse
Affiliation(s)
- Tiago Simões
- MARE - Marine and Environmental Sciences Centre, ESTM, Polytechnic Institute of Leiria, Peniche, Portugal; Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Portugal; Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands.
| | - Sara C Novais
- MARE - Marine and Environmental Sciences Centre, ESTM, Polytechnic Institute of Leiria, Peniche, Portugal; Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Tiago Natal-da-Luz
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Portugal
| | - Bart Devreese
- Laboratory for Microbiology (LM-Ugent), Ghent University, Belgium
| | - Tjalf de Boer
- Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Dick Roelofs
- Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - José P Sousa
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Portugal
| | - Nico M van Straalen
- Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Marco F L Lemos
- MARE - Marine and Environmental Sciences Centre, ESTM, Polytechnic Institute of Leiria, Peniche, Portugal
| |
Collapse
|
28
|
Sun S, Miao Z, Ratcliffe B, Campbell P, Pasch B, El-Kassaby YA, Balasundaram B, Chen C. SNP variable selection by generalized graph domination. PLoS One 2019; 14:e0203242. [PMID: 30677030 PMCID: PMC6345469 DOI: 10.1371/journal.pone.0203242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/08/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p≫n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models. METHODS AND FINDINGS K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum k-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).
Collapse
Affiliation(s)
- Shuzhen Sun
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, United States of America
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Zhuqi Miao
- Center for Health Systems Innovation, Oklahoma State University, Stillwater, United States of America
| | - Blaise Ratcliffe
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Polly Campbell
- Department of Integrative Biology, Oklahoma State University, Stillwater, United States of America
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, Riverside, United States of America
| | - Bret Pasch
- Department of Biological Sciences, Northern Arizona University, Flagstaff, United States of America
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, B.C. Canada
| | - Balabhaskar Balasundaram
- School of Industrial Engineering and Management, Oklahoma State University, Stillwater, United States of America
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, United States of America
- * E-mail:
| |
Collapse
|
29
|
Gauthier J, Vincent AT, Charette SJ, Derome N. A brief history of bioinformatics. Brief Bioinform 2018; 20:1981-1996. [DOI: 10.1093/bib/bby063] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 06/22/2018] [Indexed: 02/06/2023] Open
Abstract
AbstractIt is easy for today’s students and researchers to believe that modern bioinformatics emerged recently to assist next-generation sequencing data analysis. However, the very beginnings of bioinformatics occurred more than 50 years ago, when desktop computers were still a hypothesis and DNA could not yet be sequenced. The foundations of bioinformatics were laid in the early 1960s with the application of computational methods to protein sequence analysis (notably, de novo sequence assembly, biological sequence databases and substitution models). Later on, DNA analysis also emerged due to parallel advances in (i) molecular biology methods, which allowed easier manipulation of DNA, as well as its sequencing, and (ii) computer science, which saw the rise of increasingly miniaturized and more powerful computers, as well as novel software better suited to handle bioinformatics tasks. In the 1990s through the 2000s, major improvements in sequencing technology, along with reduced costs, gave rise to an exponential increase of data. The arrival of ‘Big Data’ has laid out new challenges in terms of data mining and management, calling for more expertise from computer science into the field. Coupled with an ever-increasing amount of bioinformatics tools, biological Big Data had (and continues to have) profound implications on the predictive power and reproducibility of bioinformatics results. To overcome this issue, universities are now fully integrating this discipline into the curriculum of biology students. Recent subdisciplines such as synthetic biology, systems biology and whole-cell modeling have emerged from the ever-increasing complementarity between computer science and biology.
Collapse
Affiliation(s)
- Jeff Gauthier
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| | - Antony T Vincent
- INRS-Institut Armand-Frappier, Bacterial Symbionts Evolution, 531 boul. des Prairies, Laval, QC, Canada
| | - Steve J Charette
- Centre de Recherche de l'Institut, Universitaire de Cardiologie et de Pneumologie de Québec (CRIUCPQ), 2725 Chemin Sainte-Foy, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-informatique, Université Laval, Québec, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| |
Collapse
|
30
|
Ng S, Strunk T, Jiang P, Muk T, Sangild PT, Currie A. Precision Medicine for Neonatal Sepsis. Front Mol Biosci 2018; 5:70. [PMID: 30094238 PMCID: PMC6070631 DOI: 10.3389/fmolb.2018.00070] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 07/06/2018] [Indexed: 11/24/2022] Open
Abstract
Neonatal sepsis remains a significant cause of morbidity and mortality especially in the preterm infant population. The ability to promptly and accurately diagnose neonatal sepsis based on clinical evaluation and laboratory blood tests remains challenging. Advances in high-throughput molecular technologies have increased investigations into the utility of transcriptomic, proteomic and metabolomic approaches as diagnostic tools for neonatal sepsis. A systems-level understanding of neonatal sepsis, obtained by using omics-based technologies (at the transcriptome, proteome or metabolome level), may lead to new diagnostic tools for neonatal sepsis. In particular, recent omic-based studies have identified distinct transcriptional signatures and metabolic or proteomic biomarkers associated with sepsis. Despite the emerging need for a systems biology approach, future studies have to address the challenges of integrating multi-omic data with laboratory and clinical meta-data in order to translate outcomes into precision medicine for neonatal sepsis. Omics-based analytical approaches may advance diagnostic tools for neonatal sepsis. More research is needed to validate the recent systems biology findings in order to integrate multi-dimensional data (clinical, laboratory and multi-omic) for future translation into precision medicine for neonatal sepsis. This review will discuss the possible applications of omics-based analyses for identification of new biomarkers and diagnostic signatures for neonatal sepsis, focusing on the immune-compromised preterm infant and considerations for clinical translation.
Collapse
Affiliation(s)
- Sherrianne Ng
- Medical and Molecular Sciences, School of Veterinary and Life Sciences, Murdoch University, Perth, WA, Australia
| | - Tobias Strunk
- Centre for Neonatal Research and Education, The University of Western Australia, Perth, WA, Australia
| | - Pingping Jiang
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Tik Muk
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Per T Sangild
- Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Andrew Currie
- Medical and Molecular Sciences, School of Veterinary and Life Sciences, Murdoch University, Perth, WA, Australia.,Centre for Neonatal Research and Education, The University of Western Australia, Perth, WA, Australia
| |
Collapse
|
31
|
Ham S, Kim TK, Hong H, Kim YS, Tang YP, Im HI. Big Data Analysis of Genes Associated With Neuropsychiatric Disorders in an Alzheimer's Disease Animal Model. Front Neurosci 2018; 12:407. [PMID: 29962931 PMCID: PMC6013555 DOI: 10.3389/fnins.2018.00407] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 05/25/2018] [Indexed: 11/13/2022] Open
Abstract
Alzheimer's disease is a neurodegenerative disease characterized by the impairment of cognitive function and loss of memory, affecting millions of individuals worldwide. With the dramatic increase in the prevalence of Alzheimer's disease, it is expected to impose extensive public health and economic burden. However, this burden is particularly heavy on the caregivers of Alzheimer's disease patients eliciting neuropsychiatric symptoms that include mood swings, hallucinations, and depression. Interestingly, these neuropsychiatric symptoms are shared across symptoms of bipolar disorder, schizophrenia, and major depression disorder. Despite the similarities in symptomatology, comorbidities of Alzheimer's disease and these neuropsychiatric disorders have not been studied in the Alzheimer's disease model. Here, we explore the comprehensive changes in gene expression of genes that are associated with bipolar disorder, schizophrenia, and major depression disorder through the microarray of an Alzheimer's disease animal model, the forebrain specific PSEN double knockout mouse. To analyze the genes related with these three neuropsychiatric disorders within the scope of our microarray data, we used selected 1207 of a total of 45,037 genes that satisfied our selection criteria. These genes were selected on the basis of 14 Gene Ontology terms significantly relevant with the three disorders which were identified by previous research conducted by the Psychiatric Genomics Consortium. Our study revealed that the forebrain specific deletion of Alzheimer's disease genes can significantly alter neuropsychiatric disorder associated genes. Most importantly, most of these significantly altered genes were found to be involved with schizophrenia. Taken together, we suggest that the synaptic dysfunction by mutation of Alzheimer's disease genes can lead to the manifestation of not only memory loss and impairments in cognition, but also neuropsychiatric symptoms.
Collapse
Affiliation(s)
- Suji Ham
- Convergence Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology (KIST), Seoul, South Korea.,Division of Bio-Medical Science & Technology, KIST School, University of Science and Technology, Seoul, South Korea
| | - Tae K Kim
- Convergence Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology (KIST), Seoul, South Korea.,Department of Biology, Boston University, Boston, MA, United States
| | - Heeok Hong
- Department of Medical Science, Graduate School of Medicine, Konkuk University, Seoul, South Korea
| | - Yong S Kim
- Department of Pharmacology, Seoul National University College of Medicine, Seoul National University, Seoul, South Korea
| | - Ya-Ping Tang
- Neuroscience Center of Excellence, Louisiana State University Health Sciences Center New Orleans, New Orleans, LA, United States
| | - Heh-In Im
- Convergence Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology (KIST), Seoul, South Korea.,Division of Bio-Medical Science & Technology, KIST School, University of Science and Technology, Seoul, South Korea.,Center for Neuroscience, Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul, South Korea
| |
Collapse
|
32
|
Mahmud M, Kaiser MS, Hussain A, Vassanelli S. Applications of Deep Learning and Reinforcement Learning to Biological Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2063-2079. [PMID: 29771663 DOI: 10.1109/tnnls.2018.2790388] [Citation(s) in RCA: 230] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Collapse
|
33
|
Qiu X, Feng JR, Qiu J, Liu L, Xie Y, Zhang YP, Liu J, Zhao Q. ITGBL1 promotes migration, invasion and predicts a poor prognosis in colorectal cancer. Biomed Pharmacother 2018; 104:172-180. [PMID: 29772438 DOI: 10.1016/j.biopha.2018.05.033] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/08/2018] [Accepted: 05/08/2018] [Indexed: 12/27/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common malignancies worldwide; its progression and prognosis are associated with oncogenes. The present study aimed to identify differentially expressed genes (DEGs) and explore the role and potential mechanism of integrin subunit β like 1 (ITGBL1) in CRC. The microarray dataset GSE41258 was used to screen DEGs involved in CRC. Survival analysis was performed to predict the prognosis of CRC patients. To validate ITGBL1 expression, immunohistochemistry, quantitative real-time PCR and western blotting were performed in CRC tissues and cells. Subsequently, the effects of ITGBL1 were evaluated through colony formation, cell proliferation, migration and invasion assays. Finally, we took advantage of Gene Ontology (GO) analysis and Gene Set Enrichment Analysis (GSEA) to explore potential function and mechanism of ITGBL1 in CRC. In our study, 182 primary CRC tissues and 54 normal colon tissues were contained in GSE41258 dataset. A total of 318 DEGs were screened, among which ITGBL1 was found to be significantly up-regulated in CRC, and its high expression was associated with shortened survival of CRC patients. Moreover, knockdown of ITGBL1 promoted CRC cell proliferation, migration and invasion. Finally, GO analysis revealed that ITGBL1 was associated with cell adhesion. GSEA indicated that ITGBL1 was enriched in ECM receptor interaction and focal adhesion. In conclusion, a novel oncogene ITGBL1 was identified and demonstrated to be associated with the progression and prognosis of CRC, which might be a potential therapeutic target and prognostic biomarker for CRC patients.
Collapse
Affiliation(s)
- Xiao Qiu
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China
| | - Jue-Rong Feng
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong, 518035, PR China
| | - Jun Qiu
- Department of Stomatology, Fuzhou First People's Hospital, Fuzhou, Jiangxi, 344000, PR China
| | - Lan Liu
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China
| | - Yang Xie
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China
| | - Yu-Peng Zhang
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China
| | - Jing Liu
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China
| | - Qiu Zhao
- Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, 430071, PR China; The Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, Hubei, 430071, PR China.
| |
Collapse
|
34
|
Hu T, Oksanen K, Zhang W, Randell E, Furey A, Sun G, Zhai G. An evolutionary learning and network approach to identifying key metabolites for osteoarthritis. PLoS Comput Biol 2018; 14:e1005986. [PMID: 29494586 PMCID: PMC5849325 DOI: 10.1371/journal.pcbi.1005986] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 03/13/2018] [Accepted: 01/06/2018] [Indexed: 12/20/2022] Open
Abstract
Metabolomics studies use quantitative analyses of metabolites from body fluids or tissues in order to investigate a sequence of cellular processes and biological systems in response to genetic and environmental influences. This promises an immense potential for a better understanding of the pathogenesis of complex diseases. Most conventional metabolomics analysis methods exam one metabolite at a time and may overlook the synergistic effect of combining multiple metabolites. In this article, we proposed a new bioinformatics framework that infers the non-linear synergy among multiple metabolites using a symbolic model and subsequently, identify key metabolites using network analysis. Such a symbolic model is able to represent a complex non-linear relationship among a set of metabolites associated with osteoarthritis (OA) and is automatically learned using an evolutionary algorithm. Applied to the Newfoundland Osteoarthritis Study (NFOAS) dataset, our methodology was able to identify nine key metabolites including some known osteoarthritis-associated metabolites and some novel metabolic markers that have never been reported before. The results demonstrate the effectiveness of our methodology and more importantly, with further investigations, propose new hypotheses that can help better understand the OA disease. Biomedical research has entered a new era where a large number of molecules and different components in biological systems can be quantitatively examined to investigate the causes of common human diseases. However, given the complexity of biological systems, those causes may not contribute to diseases individually but through interactions. The identification of those interactions, or the synergy of multiple factors, is a very challenging task due to the computational limitation, as well as the lack of effective methodologies for investigating multiple factors simultaneously. In this study, we proposed to model such an interaction effect through a self-learning algorithm using mechanisms inspired by natural evolution. Moreover, by constructing a synergy network using those evolved models, we were able to identify a set of interacting factors associated with a particular disease.
Collapse
Affiliation(s)
- Ting Hu
- Department of Computer Science, Memorial University, St. John’s, Newfoundland and Labrador, Canada
- * E-mail:
| | - Karoliina Oksanen
- Department of Computer Science, Memorial University, St. John’s, Newfoundland and Labrador, Canada
| | - Weidong Zhang
- Faculty of Medicine, Memorial University, St. John’s, Newfoundland and Labrador, Canada
- School of Pharmaceutical Sciences, Jilin University, Changchun, China
| | - Ed Randell
- Faculty of Medicine, Memorial University, St. John’s, Newfoundland and Labrador, Canada
| | - Andrew Furey
- Faculty of Medicine, Memorial University, St. John’s, Newfoundland and Labrador, Canada
| | - Guang Sun
- Faculty of Medicine, Memorial University, St. John’s, Newfoundland and Labrador, Canada
| | - Guangju Zhai
- Faculty of Medicine, Memorial University, St. John’s, Newfoundland and Labrador, Canada
| |
Collapse
|
35
|
Hufsky F, Ibrahim B, Beer M, Deng L, Mercier PL, McMahon DP, Palmarini M, Thiel V, Marz M. Virologists-Heroes need weapons. PLoS Pathog 2018; 14:e1006771. [PMID: 29420617 PMCID: PMC5805341 DOI: 10.1371/journal.ppat.1006771] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Franziska Hufsky
- European Virus Bioinformatics Center, Jena, Germany
- RNA Bioinformatics and High-Throughput Analysis Jena, Friedrich Schiller University Jena, Jena, Germany
| | - Bashar Ibrahim
- European Virus Bioinformatics Center, Jena, Germany
- RNA Bioinformatics and High-Throughput Analysis Jena, Friedrich Schiller University Jena, Jena, Germany
| | - Martin Beer
- European Virus Bioinformatics Center, Jena, Germany
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institute, Greifswald, Germany
| | - Li Deng
- European Virus Bioinformatics Center, Jena, Germany
- Institute of Virology, Helmholtz Zentrum Munich, Munich, Germany
| | - Philippe Le Mercier
- European Virus Bioinformatics Center, Jena, Germany
- Swiss-Prot group, SIB, CMU, University of Geneva Medical School, Geneva, Switzerland
| | - Dino P. McMahon
- European Virus Bioinformatics Center, Jena, Germany
- Host parasite evolution and ecology, Institute of Biology, Free University of Berlin, Berlin, Germany
- Department for Materials and Environment, BAM, Federal Institute for Materials Research and Testing, Berlin, Germany
| | - Massimo Palmarini
- European Virus Bioinformatics Center, Jena, Germany
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Volker Thiel
- European Virus Bioinformatics Center, Jena, Germany
- Federal Department of Home Affairs, Institute of Virology and Immunology, Bern and Mittelhäusern, Switzerland
- Department of Infectious Diseases and Pathobiology, University of Bern, Bern, Switzerland
| | - Manja Marz
- European Virus Bioinformatics Center, Jena, Germany
- RNA Bioinformatics and High-Throughput Analysis Jena, Friedrich Schiller University Jena, Jena, Germany
- * E-mail:
| |
Collapse
|
36
|
Shahjaman M, Kumar N, Ahmed MS, Begum A, Islam SMS, Mollah MNH. Robust Feature Selection Approach for Patient Classification using Gene Expression Data. Bioinformation 2017; 13:327-332. [PMID: 29162964 PMCID: PMC5680713 DOI: 10.6026/97320630013327] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Revised: 09/11/2017] [Accepted: 09/12/2017] [Indexed: 11/23/2022] Open
Abstract
Patient classification through feature selection (FS) based on gene expression data (GED) has already become popular to the research communities. T-test is the well-known statistical FS method in GED analysis. However, it produces higher false positives and lower accuracies for small sample sizes or in presence of outliers. To get rid from the shortcomings of t-test with small sample sizes, SAM has been applied in GED. But, it is highly sensitive to outliers. Recently, robust SAM using the minimum β-divergence estimators has overcome all the problems of classical t-test & SAM and it has been successfully applied for identification of differentially expressed (DE) genes. But, it was not applied in classification. Therefore, in this paper, we employ robust SAM as a feature selection approach along with classifiers for patient classification. We demonstrate the performance of the robust SAM in a comparison of classical t-test and SAM along with four popular classifiers (LDA, KNN, SVM and naive Bayes) using both simulated and real gene expression datasets. The results obtained from simulation and real data analysis confirm that the performance of the four classifiers improve with robust SAM than the classical t-test and SAM. From a real Colon cancer dataset we identified 21 additional DE genes using robust SAM that were not identified by the classical t-test or SAM. To reveal the biological functions and pathways of these 21 genes, we perform KEGG pathway enrichment analysis and found that these genes are involved in some important pathways related to cancer disease.
Collapse
Affiliation(s)
- Md. Shahjaman
- Bioinformatics Lab, Department of Statistics, University of Rajshahi-6205, Bangladesh
- Department of Statistics, Begum Rokeya University, Rangpur-5400, Bangladesh
| | - Nishith Kumar
- Bioinformatics Lab, Department of Statistics, University of Rajshahi-6205, Bangladesh
- Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh
| | - Md. Shakil Ahmed
- Bioinformatics Lab, Department of Statistics, University of Rajshahi-6205, Bangladesh
| | - AnjumanAra Begum
- Bioinformatics Lab, Department of Statistics, University of Rajshahi-6205, Bangladesh
| | - S. M. Shahinul Islam
- Institutitute of Biological Science (IBSc), University of Rajshahi, Rajshahi-6205, Bangladesh
| | | |
Collapse
|
37
|
Omae K, Komori O, Eguchi S. Quasi-linear score for capturing heterogeneous structure in biomarkers. BMC Bioinformatics 2017; 18:308. [PMID: 28629325 PMCID: PMC5477283 DOI: 10.1186/s12859-017-1721-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 06/09/2017] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Linear scores are widely used to predict dichotomous outcomes in biomedical studies because of their learnability and understandability. Such approaches, however, cannot be used to elucidate biodiversity when there is heterogeneous structure in target population. RESULTS Our study was focused on describing intrinsic heterogeneity in predictions. Because heterogeneity can be captured by a clustering method, integrating different information from different clusters should yield better predictions. Accordingly, we developed a quasi-linear score, which effectively combines the linear scores of clustered markers. We extended the linear score to the quasi-linear score by a generalized average form, the Kolmogorov-Nagumo average. We observed that two shrinkage methods worked well: ridge shrinkage for estimating the quasi-linear score, and lasso shrinkage for selecting markers within each cluster. Simulation studies and applications to real data show that the proposed method has good predictive performance compared with existing methods. CONCLUSIONS Heterogeneous structure is captured by a clustering method. Quasi-linear scores combine such heterogeneity and have a better predictive ability compared with linear scores.
Collapse
Affiliation(s)
- Katsuhiro Omae
- Department of Statistical Science, The Graduate University for Advanced Studies, 10-3, Midoricho, Tachikawa, Tokyo, 190-8562 Japan
| | - Osamu Komori
- Department of Electrical, Electronic and Computer Engineering, University of Fukui, Fukui, Japan
| | - Shinto Eguchi
- Department of Statistical Science, The Graduate University for Advanced Studies, 10-3, Midoricho, Tachikawa, Tokyo, 190-8562 Japan
- The Institute of Statistical Mathematics, Tokyo, Japan
| |
Collapse
|
38
|
Salazar BM, Balczewski EA, Ung CY, Zhu S. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology. Int J Mol Sci 2016; 18:E37. [PMID: 28035989 PMCID: PMC5297672 DOI: 10.3390/ijms18010037] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 12/14/2016] [Accepted: 12/17/2016] [Indexed: 12/13/2022] Open
Abstract
Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.
Collapse
Affiliation(s)
- Brittany M Salazar
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN 55902, USA.
| | - Emily A Balczewski
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| | - Choong Yong Ung
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| | - Shizhen Zhu
- Department of Biochemistry and Molecular Biology, Mayo Clinic College of Medicine, Rochester, MN 55902, USA.
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| |
Collapse
|
39
|
Abstract
The purpose of this review is to survey current, emerging and predicted future biotechnologies which are impacting, or are likely to impact in the future on the life sciences, with a projection for the coming 20 years. This review is intended to discuss current and future technical strategies, and to explore areas of potential growth during the foreseeable future. Information technology approaches have been employed to gather and collate data. Twelve broad categories of biotechnology have been identified which are currently impacting the life sciences and will continue to do so. In some cases, technology areas are being pushed forward by the requirement to deal with contemporary questions such as the need to address the emergence of anti-microbial resistance. In other cases, the biotechnology application is made feasible by advances in allied fields in biophysics (e.g. biosensing) and biochemistry (e.g. bio-imaging). In all cases, the biotechnologies are underpinned by the rapidly advancing fields of information systems, electronic communications and the World Wide Web together with developments in computing power and the capacity to handle extensive biological data. A rationale and narrative is given for the identification of each technology as a growth area. These technologies have been categorized by major applications, and are discussed further. This review highlights: Biotechnology has far-reaching applications which impinge on every aspect of human existence. The applications of biotechnology are currently wide ranging and will become even more diverse in the future. Access to supercomputing facilities and the ability to manipulate large, complex biological datasets, will significantly enhance knowledge and biotechnological development.
Collapse
Affiliation(s)
- E Diane Williamson
- a CBR Division , Defence Science & Technology Laboratory , Porton Down , Salisbury , UK
| |
Collapse
|
40
|
Li TS, Bravo À, Furlong LI, Good BM, Su AI. A crowdsourcing workflow for extracting chemical-induced disease relations from free text. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw051. [PMID: 27087308 PMCID: PMC4834205 DOI: 10.1093/database/baw051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 03/17/2016] [Indexed: 01/05/2023]
Abstract
Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex
Collapse
Affiliation(s)
- Tong Shu Li
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Benjamin M Good
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
41
|
Zeng T, Zhang W, Yu X, Liu X, Li M, Chen L. Big-data-based edge biomarkers: study on dynamical drug sensitivity and resistance in individuals. Brief Bioinform 2015; 17:576-92. [DOI: 10.1093/bib/bbv078] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Indexed: 12/21/2022] Open
|
42
|
Ow GS, Kuznetsov VA. Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers. BMC Genomics 2015; 16 Suppl 7:S2. [PMID: 26100469 PMCID: PMC4474413 DOI: 10.1186/1471-2164-16-s7-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction. Methods We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS. Results We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs. Conclusions Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development.
Collapse
|
43
|
Agarwal M, Adhil M, Talukder AK. Multi-omics Multi-scale Big Data Analytics for Cancer Genomics. BIG DATA ANALYTICS 2015. [DOI: 10.1007/978-3-319-27057-9_16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|