1
|
Stock M, Van Criekinge W, Boeckaerts D, Taelman S, Van Haeverbeke M, Dewulf P, De Baets B. Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data. PLoS Comput Biol 2024; 20:e1012426. [PMID: 39316621 PMCID: PMC11421772 DOI: 10.1371/journal.pcbi.1012426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024] Open
Abstract
Advances in bioinformatics are primarily due to new algorithms for processing diverse biological data sources. While sophisticated alignment algorithms have been pivotal in analyzing biological sequences, deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses. However, these methods are incredibly data-hungry, compute-intensive, and hard to interpret. Hyperdimensional computing (HDC) has recently emerged as an exciting alternative. The key idea is that random vectors of high dimensionality can represent concepts such as sequence identity or phylogeny. These vectors can then be combined using simple operators for learning, reasoning, or querying by exploiting the peculiar properties of high-dimensional spaces. Our work reviews and explores HDC's potential for bioinformatics, emphasizing its efficiency, interpretability, and adeptness in handling multimodal and structured data. HDC holds great potential for various omics data searching, biosignal analysis, and health applications.
Collapse
Affiliation(s)
- Michiel Stock
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Wim Van Criekinge
- Biobix Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Dimitri Boeckaerts
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Steff Taelman
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Biobix Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- BioLizard nv, Ghent, Belgium
| | - Maxime Van Haeverbeke
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Pieter Dewulf
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Bernard De Baets
- KERMIT Research Unit, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
2
|
Bowerman AF. A demonstration of the enviromics approach to integrating environmental 'big data' problems. THE NEW PHYTOLOGIST 2024. [PMID: 39212492 DOI: 10.1111/nph.20079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Affiliation(s)
- Andrew F Bowerman
- Division of Plant Sciences, Research School of Biology, ANU College of Science, The Australian National University, Canberra, ACT, 2600, Australia
| |
Collapse
|
3
|
Xie J, Ruan S, Tu M, Yuan Z, Hu J, Li H, Li S. Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding. Oncogene 2024; 43:2279-2292. [PMID: 38834657 DOI: 10.1038/s41388-024-03074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 05/22/2024] [Accepted: 05/28/2024] [Indexed: 06/06/2024]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) is a high-throughput technique used to study gene expression at the single-cell level. Clustering analysis is a commonly used method in scRNA-seq data analysis, helping researchers identify cell types and uncover interactions between cells. However, the choice of a robust similarity metric in the clustering procedure is still an open challenge due to the complex underlying structures of the data and the inherent noise in data acquisition. Here, we propose a deep clustering method for scRNA-seq data called scRISE (scRNA-seq Iterative Smoothing and self-supervised discriminative Embedding model) to resolve this challenge. The model consists of two main modules: an iterative smoothing module based on graph autoencoders designed to denoise the data and refine the pairwise similarity in turn to gradually incorporate cell structural features and enrich the data information; and a self-supervised discriminative embedding module with adaptive similarity threshold for partitioning samples into correct clusters. Our approach has shown improved quality of data representation and clustering on seventeen scRNA-seq datasets against a number of state-of-the-art deep learning clustering methods. Furthermore, utilizing the scRISE method in biological analysis against the HNSCC dataset has unveiled 62 informative genes, highlighting their potential roles as therapeutic targets and biomarkers.
Collapse
Affiliation(s)
- Jinxin Xie
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Shanshan Ruan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Mingyan Tu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Jianguo Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
- Lingang Laboratory, Shanghai, 200031, China.
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
4
|
Zhang G, Qin J, Xu W, Liu M, Wu R, Qin Y. Gene expression and immune infiltration analysis comparing lesioned and preserved subchondral bone in osteoarthritis. PeerJ 2024; 12:e17417. [PMID: 38827307 PMCID: PMC11141552 DOI: 10.7717/peerj.17417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/28/2024] [Indexed: 06/04/2024] Open
Abstract
Background Osteoarthritis (OA) is a degenerative disease requiring additional research. This study compared gene expression and immune infiltration between lesioned and preserved subchondral bone. The results were validated using multiple tissue datasets and experiments. Methods Differentially expressed genes (DEGs) between the lesioned and preserved tibial plateaus of OA patients were identified in the GSE51588 dataset. Moreover, functional annotation and protein-protein interaction (PPI) network analyses were performed on the lesioned and preserved sides to explore potential therapeutic targets in OA subchondral bones. In addition, multiple tissues were used to screen coexpressed genes, and the expression levels of identified candidate DEGs in OA were measured by quantitative real-time polymerase chain reaction. Finally, an immune infiltration analysis was conducted. Results A total of 1,010 DEGs were identified, 423 upregulated and 587 downregulated. The biological process (BP) terms enriched in the upregulated genes included "skeletal system development", "sister chromatid cohesion", and "ossification". Pathways were enriched in "Wnt signaling pathway" and "proteoglycans in cancer". The BP terms enriched in the downregulated genes included "inflammatory response", "xenobiotic metabolic process", and "positive regulation of inflammatory response". The enriched pathways included "neuroactive ligand-receptor interaction" and "AMP-activated protein kinase signaling". JUN, tumor necrosis factor α, and interleukin-1β were the hub genes in the PPI network. Collagen XI A1 and leucine-rich repeat-containing 15 were screened from multiple datasets and experimentally validated. Immune infiltration analyses showed fewer infiltrating adipocytes and endothelial cells in the lesioned versus preserved samples. Conclusion Our findings provide valuable information for future studies on the pathogenic mechanism of OA and potential therapeutic and diagnostic targets.
Collapse
Affiliation(s)
- Gang Zhang
- The Second Affiliated Hospital of Harbin Medical University, Department of Orthopedics Surgery, Harbin Medical University, Harbin, China
- Department of Orthopedics, Harbin First Hospital, Harbin, China
- Future Medicine Laboratory, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Jinwei Qin
- Department of Emergency, Harbin First Hospital, Harbin, China
| | - Wenbo Xu
- The Second Affiliated Hospital of Harbin Medical University, Department of Orthopedics Surgery, Harbin Medical University, Harbin, China
| | - Meina Liu
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Rilige Wu
- Medical Big Data Research Center, Medical Innovation Research Division of PLA General Hospital, Beijing, China
| | - Yong Qin
- The Second Affiliated Hospital of Harbin Medical University, Department of Orthopedics Surgery, Harbin Medical University, Harbin, China
| |
Collapse
|
5
|
Rader JA, Pivovarnik MA, Vantilburg ME, Whitehouse LS. PhyloMatcher: a tool for resolving conflicts in taxonomic nomenclature. BIOINFORMATICS ADVANCES 2023; 3:vbad144. [PMID: 37840907 PMCID: PMC10576170 DOI: 10.1093/bioadv/vbad144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/17/2023]
Abstract
Summary Large-scale comparative studies rely on the application of both phylogenetic trees and phenotypic data, both of which come from a variety of sources, but due to the changing nature of phylogenetic classification over time, many taxon names in comparative datasets do not match the nomenclature in phylogenetic trees. Manual curation of taxonomic synonyms in large comparative datasets can be daunting. To address this issue, we introduce PhyloMatcher, a tool which allows for programmatic querying of the National Center for Biotechnology Information Taxonomy and Global Biodiversity Information Facility databases to find associated synonyms with given target species names. Availability and implementation PhyloMatcher is easily installed as a Python package with pip, or as a standalone GUI application. PhyloMatcher source code and documentation are freely available at https://github.com/Lswhiteh/PhyloMatcher, the GUI application can be downloaded from the Releases page.
Collapse
Affiliation(s)
- Jonathan A Rader
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-3280, United States
| | - Madelyn A Pivovarnik
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-3280, United States
| | - Matias E Vantilburg
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-3280, United States
| | - Logan S Whitehouse
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7264, United States
| |
Collapse
|
6
|
Rader JA, Pivovarnik MA, Vantilburg ME, Whitehouse LS. PhyloMatcher: a tool for resolving conflicts in taxonomic nomenclature. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.07.552263. [PMID: 37609275 PMCID: PMC10441299 DOI: 10.1101/2023.08.07.552263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Summary Large-scale comparative studies rely on the application of both phylogenetic trees and phenotypic data, both of which come from a variety of sources, but due to the changing nature of phylogenetic classification over time, many taxon names in comparative datasets do not match the nomenclature in phylogenetic trees. Manual curation of taxonomic synonyms in large comparative datasets can be daunting. To address this issue, we introduce PhyloMatcher, a tool which allows for programmatic querying of two commonly used taxonomic databases to find associated synonyms with given target species names. Availability and implementation PhyloMatcher is easily installed as a Python package with pip, or as a standalone GUI application. PhyloMatcher source code and documentation are freely available at https://github.com/Lswhiteh/PhyloMatcher, the GUI application can be downloaded from the Releases page. Contact Lswhiteh@unc.edu. Supplemental Information We provide documentation for PhyloMatcher, including walkthrough instructions for the GUI application on the Releases page of https://github.com/Lswhiteh/PhyloMatcher.
Collapse
Affiliation(s)
- Jonathan A. Rader
- Dept. of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Matias E. Vantilburg
- Dept. of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Logan S. Whitehouse
- Dept. of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Fei X, Li Q, Jiao X, Olsen JE. Identification of Salmonella Pullorum Factors Affecting Immune Reaction in Macrophages from the Avian Host. Microbiol Spectr 2023; 11:e0078623. [PMID: 37191575 PMCID: PMC10269470 DOI: 10.1128/spectrum.00786-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 05/04/2023] [Indexed: 05/17/2023] Open
Abstract
The host-specific Salmonella serovar S. Pullorum (SP) modulates the chicken immune response to a Th2-biased response associated with persistent infection. This is different from the Th1-biased immune response induced by the genetically close serovar, S. Enteritidis (SE). Based on core genome differences between SP and SE, we used three complementary bioinformatics approaches to identify SP genes, which may be important for stimulation of the immune response. Defined mutants were constructed in selected genes, and the infection potential and ability of mutants to stimulate cytokine production in avian derived HD11 macrophages were determined. Deletion of large genomic regions unique to SP did not change infection potential nor immune stimulation significantly. Mutants in genes with conserved single nucleotide polymorphisms (SNPs) between the two serovars in the region 100 bp upstream of the start codon (conserved upstream SNPs [CuSNPs]) such as sseE, osmB, tolQ, a putative immune antigen, and a putative persistent infection factor, exhibited differences in induction of inflammatory cytokines compared to wild-type SP, suggesting a possible role of these CuSNPs in immune regulation. Single nucleotide SP mutants correcting for the CuSNP difference were constructed in the upstream region of sifA and pipA. The SNP corrected pipA mutant expressed pipA at a higher level than the wild-type SP strain, and the mutant differentially caused upregulation of proinflammatory cytokines. It suggests that this CuSNP is important for the suppression of proinflammatory responses. In conclusion, this study has identified putative immune stimulating factors of relevance to the difference in infection dynamics between SP and SE in avian macrophages. IMPORTANCE Salmonella Pullorum is host specific to avian species, where it causes life-threatening infection in young birds. It is unknown why it is host restricted and causes systemic disease, rather than gastroenteritis normally seen with Salmonella. In the present study, we identified genes and single nucleotide polymorphisms (SNPs; relative to the broad-host-range type Salmonella Enteritidis), which affected survival and immune induction in macrophages from hens suggesting a role in development of the host specific infection. Further studies of such genes may enable understanding of which genetic factors determine the development of host specific infection by S. Pullorum. In this study, we developed an in silico approach to predict candidate genes and SNPs for development of the host-specific infection and the specific induction of immunity associated with this infection. This study flow can be used in similar studies in other clades of bacteria.
Collapse
Affiliation(s)
- Xiao Fei
- Key Laboratory of Prevention and Control of Biological Hazard Factors (Animal Origin) for Agri-Food Safety and Quality, Ministry of Agriculture of China, Yangzhou University, Yangzhou, People’s Republic of China
- Jiangsu Key Lab of Zoonosis/Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, People’s Republic of China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou, People’s Republic of China
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Qiuchun Li
- Key Laboratory of Prevention and Control of Biological Hazard Factors (Animal Origin) for Agri-Food Safety and Quality, Ministry of Agriculture of China, Yangzhou University, Yangzhou, People’s Republic of China
- Jiangsu Key Lab of Zoonosis/Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, People’s Republic of China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou, People’s Republic of China
| | - Xinan Jiao
- Key Laboratory of Prevention and Control of Biological Hazard Factors (Animal Origin) for Agri-Food Safety and Quality, Ministry of Agriculture of China, Yangzhou University, Yangzhou, People’s Republic of China
- Jiangsu Key Lab of Zoonosis/Jiangsu Co-Innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, People’s Republic of China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou, People’s Republic of China
| | - John Elmerdahl Olsen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| |
Collapse
|
8
|
Yang RX, McCandler CA, Andriuc O, Siron M, Woods-Robinson R, Horton MK, Persson KA. Big Data in a Nano World: A Review on Computational, Data-Driven Design of Nanomaterials Structures, Properties, and Synthesis. ACS NANO 2022; 16:19873-19891. [PMID: 36378904 PMCID: PMC9798871 DOI: 10.1021/acsnano.2c08411] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 11/08/2022] [Indexed: 05/30/2023]
Abstract
The recent rise of computational, data-driven research has significant potential to accelerate materials discovery. Automated workflows and materials databases are being rapidly developed, contributing to high-throughput data of bulk materials that are growing in quantity and complexity, allowing for correlation between structural-chemical features and functional properties. In contrast, computational data-driven approaches are still relatively rare for nanomaterials discovery due to the rapid scaling of computational cost for finite systems. However, the distinct behaviors at the nanoscale as compared to the parent bulk materials and the vast tunability space with respect to dimensionality and morphology motivate the development of data sets for nanometric materials. In this review, we discuss the recent progress in data-driven research in two aspects: functional materials design and guided synthesis, including commonly used metrics and approaches for designing materials properties and predicting synthesis routes. More importantly, we discuss the distinct behaviors of materials as a result of nanosizing and the implications for data-driven research. Finally, we share our perspectives on future directions for extending the current data-driven research into the nano realm.
Collapse
Affiliation(s)
- Ruo Xi Yang
- Materials
Science Division, Lawrence Berkeley National
Laboratory, Berkeley, California94720, United States
| | - Caitlin A. McCandler
- Materials
Science Division, Lawrence Berkeley National
Laboratory, Berkeley, California94720, United States
- Department
of Materials Science and Engineering, University
of California, Berkeley, California94720, United States
| | - Oxana Andriuc
- Department
of Chemistry, University of California, Berkeley, California94720, United States
- Liquid
Sunlight Alliance and Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California94720, United States
| | - Martin Siron
- Materials
Science Division, Lawrence Berkeley National
Laboratory, Berkeley, California94720, United States
- Department
of Materials Science and Engineering, University
of California, Berkeley, California94720, United States
| | - Rachel Woods-Robinson
- Materials
Science Division, Lawrence Berkeley National
Laboratory, Berkeley, California94720, United States
| | - Matthew K. Horton
- Materials
Science Division, Lawrence Berkeley National
Laboratory, Berkeley, California94720, United States
- Department
of Materials Science and Engineering, University
of California, Berkeley, California94720, United States
| | - Kristin A. Persson
- Department
of Materials Science and Engineering, University
of California, Berkeley, California94720, United States
- Molecular
Foundry, Energy Sciences Area, Lawrence
Berkeley National Laboratory, Berkeley, California94720, United States
| |
Collapse
|
9
|
Maleki E, Akbari Rokn Abadi S, Koohi S. HELIOS: High-speed sequence alignment in optics. PLoS Comput Biol 2022; 18:e1010665. [PMID: 36409684 PMCID: PMC9678324 DOI: 10.1371/journal.pcbi.1010665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Accepted: 10/18/2022] [Indexed: 11/22/2022] Open
Abstract
In response to the imperfections of current sequence alignment methods, originated from the inherent serialism within their corresponding electrical systems, a few optical approaches for biological data comparison have been proposed recently. However, due to their low performance, raised from their inefficient coding scheme, this paper presents a novel all-optical high-throughput method for aligning DNA, RNA, and protein sequences, named HELIOS. The HELIOS method employs highly sophisticated operations to locate character matches, single or multiple mutations, and single or multiple indels within various biological sequences. On the other hand, the HELIOS optical architecture exploits high-speed processing and operational parallelism in optics, by adopting wavelength and polarization of optical beams. For evaluation, the functionality and accuracy of the HELIOS method are approved through behavioral and optical simulation studies, while its complexity and performance are estimated through analytical computation. The accuracy evaluations indicate that the HELIOS method achieves a precise pairwise alignment of two sequences, highly similar to those of Smith-Waterman, Needleman-Wunsch, BLAST, MUSCLE, ClustalW, ClustalΩ, T-Coffee, Kalign, and MAFFT. According to our performance evaluations, the HELIOS optical architecture outperforms all alternative electrical and optical algorithms in terms of processing time and memory requirement, relying on its highly sophisticated method and optical architecture. Moreover, the employed compact coding scheme highly escalates the number of input characters, and hence, it offers reduced time and space complexities, compared to the electrical and optical alternatives. It makes the HELIOS method and optical architecture highly applicable for biomedical applications.
Collapse
Affiliation(s)
- Ehsan Maleki
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | | | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
- * E-mail:
| |
Collapse
|
10
|
Ai N, Yang Z, Yuan H, Ouyang D, Miao R, Ji Y, Liang Y. A distributed sparse logistic regression with $$L_{1/2}$$ regularization for microarray biomarker discovery in cancer classification. Soft comput 2022. [DOI: 10.1007/s00500-022-07551-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
11
|
The R Language: An Engine for Bioinformatics and Data Science. Life (Basel) 2022; 12:life12050648. [PMID: 35629316 PMCID: PMC9148156 DOI: 10.3390/life12050648] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 04/21/2022] [Accepted: 04/23/2022] [Indexed: 12/14/2022] Open
Abstract
The R programming language is approaching its 30th birthday, and in the last three decades it has achieved a prominent role in statistics, bioinformatics, and data science in general. It currently ranks among the top 10 most popular languages worldwide, and its community has produced tens of thousands of extensions and packages, with scopes ranging from machine learning to transcriptome data analysis. In this review, we provide an historical chronicle of how R became what it is today, describing all its current features and capabilities. We also illustrate the major tools of R, such as the current R editors and integrated development environments (IDEs), the R Shiny web server, the R methods for machine learning, and its relationship with other programming languages. We also discuss the role of R in science in general as a driver for reproducibility. Overall, we hope to provide both a complete snapshot of R today and a practical compendium of the major features and applications of this programming language.
Collapse
|
12
|
Oliver SG. From Petri Plates to Petri Nets, a revolution in yeast biology. FEMS Yeast Res 2022; 22:6526310. [PMID: 35142857 PMCID: PMC8862034 DOI: 10.1093/femsyr/foac008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 01/26/2022] [Accepted: 02/07/2022] [Indexed: 11/22/2022] Open
Affiliation(s)
- Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| |
Collapse
|
13
|
Sharma D, Sharma A, Singh B, Kumar S, Verma S. Neglected scrub typhus: An updated review with a focus on omics technologies. ASIAN PAC J TROP MED 2022. [DOI: 10.4103/1995-7645.364003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
|
14
|
Lunn AJ, Shaw V, Winder IC. The Evolution of Scientific Visualisations: A Case Study Approach to Big Data for Varied Audiences. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1388:51-84. [DOI: 10.1007/978-3-031-10889-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Cunningham-Oakes E, Trivett H. Applied Bioinformatics and Public Health Microbiology: challenges, discoveries and innovations during a pandemic. Microb Genom 2022; 8:000757. [PMID: 35098917 PMCID: PMC8914353 DOI: 10.1099/mgen.0.000757] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/06/2021] [Indexed: 10/31/2022] Open
Abstract
The eighth Applied Bioinformatics and Public Health Microbiology (ABPHM) conference showcased the recent acceleration of bioinformatic approaches used in public health settings. This included approaches for the surveillance of infectious diseases, understanding microbial evolution and diversity and pathogen interactions. Overall, the meeting highlighted the importance of data-driven approaches used by scientists during the COVID-19 pandemic.
Collapse
Affiliation(s)
- Edward Cunningham-Oakes
- Health Protection Research Unit in Gastrointestinal Infections, HPRU Project Team, University of Liverpool, Ronald Ross Building, 8 West Derby Street, Liverpool L69 7BE, UK
- Infection Biology and Microbiomes, Institute for Infection, Veterinary and Ecological Sciences, University of Liverpool, Leahurst Campus, Neston, Wirral, CH64 7TE, UK
| | - Hannah Trivett
- Health Protection Research Unit in Gastrointestinal Infections, HPRU Project Team, University of Liverpool, Ronald Ross Building, 8 West Derby Street, Liverpool L69 7BE, UK
- Infection Biology and Microbiomes, Institute for Infection, Veterinary and Ecological Sciences, University of Liverpool, Leahurst Campus, Neston, Wirral, CH64 7TE, UK
| |
Collapse
|
16
|
Passi A, Tibocha-Bonilla JD, Kumar M, Tec-Campos D, Zengler K, Zuniga C. Genome-Scale Metabolic Modeling Enables In-Depth Understanding of Big Data. Metabolites 2021; 12:14. [PMID: 35050136 PMCID: PMC8778254 DOI: 10.3390/metabo12010014] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/18/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Genome-scale metabolic models (GEMs) enable the mathematical simulation of the metabolism of archaea, bacteria, and eukaryotic organisms. GEMs quantitatively define a relationship between genotype and phenotype by contextualizing different types of Big Data (e.g., genomics, metabolomics, and transcriptomics). In this review, we analyze the available Big Data useful for metabolic modeling and compile the available GEM reconstruction tools that integrate Big Data. We also discuss recent applications in industry and research that include predicting phenotypes, elucidating metabolic pathways, producing industry-relevant chemicals, identifying drug targets, and generating knowledge to better understand host-associated diseases. In addition to the up-to-date review of GEMs currently available, we assessed a plethora of tools for developing new GEMs that include macromolecular expression and dynamic resolution. Finally, we provide a perspective in emerging areas, such as annotation, data managing, and machine learning, in which GEMs will play a key role in the further utilization of Big Data.
Collapse
Affiliation(s)
- Anurag Passi
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| | - Juan D. Tibocha-Bonilla
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA;
| | - Manish Kumar
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| | - Diego Tec-Campos
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
- Facultad de Ingeniería Química, Campus de Ciencias Exactas e Ingenierías, Universidad Autónoma de Yucatán, Merida 97203, Yucatan, Mexico
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093-0412, USA
- Center for Microbiome Innovation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0403, USA
| | - Cristal Zuniga
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| |
Collapse
|
17
|
Zafeiropoulos H, Gioti A, Ninidakis S, Potirakis A, Paragkamian S, Angelova N, Antoniou A, Danis T, Kaitetzidou E, Kasapidis P, Kristoffersen JB, Papadogiannis V, Pavloudi C, Ha QV, Lagnel J, Pattakos N, Perantinos G, Sidirokastritis D, Vavilis P, Kotoulas G, Manousaki T, Sarropoulou E, Tsigenopoulos CS, Arvanitidis C, Magoulas A, Pafilis E. 0s and 1s in marine molecular research: a regional HPC perspective. Gigascience 2021; 10:6353916. [PMID: 34405237 PMCID: PMC8371273 DOI: 10.1093/gigascience/giab053] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 07/07/2021] [Accepted: 07/20/2021] [Indexed: 01/23/2023] Open
Abstract
High-performance computing (HPC) systems have become indispensable for modern marine research, providing support to an increasing number and diversity of users. Pairing with the impetus offered by high-throughput methods to key areas such as non-model organism studies, their operation continuously evolves to meet the corresponding computational challenges. Here, we present a Tier 2 (regional) HPC facility, operating for over a decade at the Institute of Marine Biology, Biotechnology, and Aquaculture of the Hellenic Centre for Marine Research in Greece. Strategic choices made in design and upgrades aimed to strike a balance between depth (the need for a few high-memory nodes) and breadth (a number of slimmer nodes), as dictated by the idiosyncrasy of the supported research. Qualitative computational requirement analysis of the latter revealed the diversity of marine fields, methods, and approaches adopted to translate data into knowledge. In addition, hardware and software architectures, usage statistics, policy, and user management aspects of the facility are presented. Drawing upon the last decade's experience from the different levels of operation of the Institute of Marine Biology, Biotechnology, and Aquaculture HPC facility, a number of lessons are presented; these have contributed to the facility's future directions in light of emerging distribution technologies (e.g., containers) and Research Infrastructure evolution. In combination with detailed knowledge of the facility usage and its upcoming upgrade, future collaborations in marine research and beyond are envisioned.
Collapse
Affiliation(s)
- Haris Zafeiropoulos
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece.,Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013, Heraklion, Crete, Greece
| | - Anastasia Gioti
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Stelios Ninidakis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Antonis Potirakis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Savvas Paragkamian
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece.,Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013, Heraklion, Crete, Greece
| | - Nelina Angelova
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Aglaia Antoniou
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Theodoros Danis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece.,School of Medicine, University of Crete, Voutes University Campus, 70013 Heraklion, Crete, Greece
| | - Eliza Kaitetzidou
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Panagiotis Kasapidis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Jon Bent Kristoffersen
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Vasileios Papadogiannis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Quoc Viet Ha
- Bull SAS, Rue du Gros Caillou, 78340 Les Clayes-sous-Bois, France
| | - Jacques Lagnel
- Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, UR1052, Génétique et Amélioration des Fruits et Légumes, 67 Allée des Chênes, Centre de Recherche Provence-Alpes-Côte d'Azur, Domaine Saint Maurice, CS60094, 84143 Montfavet Cedex, France
| | - Nikos Pattakos
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Giorgos Perantinos
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Dimitris Sidirokastritis
- Hellenic Centre for Marine Research, Network Operation Center, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Panagiotis Vavilis
- Hellenic Centre for Marine Research, Network Operation Center, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Tereza Manousaki
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Elena Sarropoulou
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Costas S Tsigenopoulos
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece.,LifeWatch European Research Infrastructure Consortium, Sector II-III Plaza de España, 41071, Seville, Spain
| | - Antonios Magoulas
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| | - Evangelos Pafilis
- Hellenic Centre for Marine Research, Institute of Marine Biology, Biotechnology and Aquaculture, Former U.S. Base of Gournes, P.O. Box 2214, 71003, Heraklion, Crete, Greece
| |
Collapse
|