1
|
Zheng H, Sarkar H, Raphael BJ. Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638195. [PMID: 40027720 PMCID: PMC11870578 DOI: 10.1101/2025.02.17.638195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, while the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of ad-mixed expression from technologies with lower spatial resolution. We introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint non-negative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from ad-mixed SRT data. In simulations involving paired SRT datasets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell-type-specific gene expression, and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression. A PyTorch implementation of SIID is available at https://github.com/raphael-group/siid .
Collapse
|
2
|
Nakatsuka N, Adler D, Jiang L, Hartman A, Cheng E, Klann E, Satija R. A Reproducibility Focused Meta-Analysis Method for Single-Cell Transcriptomic Case-Control Studies Uncovers Robust Differentially Expressed Genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.10.15.618577. [PMID: 39463993 PMCID: PMC11507907 DOI: 10.1101/2024.10.15.618577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
We assessed the reproducibility of differentially expressed genes (DEGs) in previously published Alzheimer's (AD), Parkinson's (PD), Schizophrenia (SCZ), and COVID-19 scRNA-seq studies. While transcriptional scores from DEGs of individual PD and COVID-19 datasets had moderate predictive power for case-control status of other datasets (AUC=0.77 and 0.75), genes from individual AD and SCZ datasets had poor predictive power (AUC=0.68 and 0.55). We developed a non-parametric meta-analysis method, SumRank, based on reproducibility of relative differential expression ranks across datasets, and found DEGs with improved predictive power (AUC=0.88, 0.91, 0.78, and 0.62). By multiple other metrics, specificity and sensitivity of these genes were substantially higher than those discovered by dataset merging and inverse variance weighted p-value aggregation methods. The DEGs revealed known and novel biological pathways, and we validate BCAT1 as down-regulated in AD mouse oligodendrocytes. Lastly, we evaluate factors influencing reproducibility of individual studies as a prospective guide for experimental design.
Collapse
|
3
|
Olson D, Colligan T, Demekas D, Roddy JW, Youens-Clark K, Wheeler TJ. NEAR: Neural Embeddings for Amino acid Relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.01.25.577287. [PMID: 39896534 PMCID: PMC11785008 DOI: 10.1101/2024.01.25.577287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Protein language models (PLMs) have recently demonstrated potential to supplant classical protein database search methods based on sequence alignment, but are slower than common alignment-based tools and appear to be prone to a high rate of false labeling. Here, we present NEAR, a method based on neural representation learning that is designed to improve both speed and accuracy of search for likely homologs in a large protein sequence database. NEAR's ResNet embedding model is trained using contrastive learning guided by trusted sequence alignments. It computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of residue-level k-NN search and a simple neighbor aggregation scheme. Tests on a benchmark consisting of trusted remote homologs and randomly shuffled decoy sequences reveal that NEAR substantially improves accuracy relative to state-of-the-art PLMs, with lower memory requirements and faster embedding and search speed. While these results suggest that the NEAR model may be useful for standalone homology detection with increased sensitivity over standard alignment-based methods, in this manuscript we focus on a more straightforward analysis of the model's value as a high-speed pre-filter for sensitive annotation. In that context, NEAR is at least 5x faster than the pre-filter currently used in the widely-used profile hidden Markov model (pHMM) search tool HMMER3, and also outperforms the pre-filter used in our fast pHMM tool, nail.
Collapse
Affiliation(s)
- Daniel Olson
- Department of Computer Science, University of Montana, Montana, USA
| | | | - Daphne Demekas
- College of Pharmacy, University of Arizona, Arizona, USA
| | - Jack W. Roddy
- College of Pharmacy, University of Arizona, Arizona, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Montana, USA
- College of Pharmacy, University of Arizona, Arizona, USA
| |
Collapse
|
4
|
Morgan AM, Devinsky O, Doyle WK, Dugan P, Friedman D, Flinker A. A magnitude-independent neural code for linguistic information during sentence production. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.20.599931. [PMID: 38948730 PMCID: PMC11212956 DOI: 10.1101/2024.06.20.599931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Humans are the only species with the ability to convey an unbounded number of novel thoughts by combining words into sentences. This process is guided by complex semantic and abstract syntactic representations. Despite their centrality to human cognition, the neural mechanisms underlying these systems remain obscured by inherent limitations of non-invasive brain measures and a near total focus on comprehension paradigms. Here, we address these limitations with high-resolution neurosurgical recordings (electrocorticography) and a controlled sentence production experiment. We uncover distinct cortical networks encoding word-level, semantic, and syntactic information. These networks are broadly distributed across traditional language areas, but with focal sensitivity to syntactic structure in middle and inferior frontal gyri. In contrast to previous findings from comprehension studies, these networks are largely non-overlapping, each specialized for just one of the three linguistic constructs we investigate. Most strikingly, our data reveal an unexpected property of higher-order linguistic information: it is encoded independent of neural activity levels. We propose that this "magnitude-independent coding" scheme represents a novel mechanism for encoding information, reserved for higher-order cognition more broadly.
Collapse
Affiliation(s)
- Adam M. Morgan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Orrin Devinsky
- Neurosurgery Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Werner K. Doyle
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
- Biomedical Engineering Department, NYU Tandon School of Engineering, 6 MetroTech Center Ave, Brooklyn, 11201, NY, USA
| |
Collapse
|
5
|
Liu J, Neupane P, Cheng J. Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.12.632663. [PMID: 39868088 PMCID: PMC11761747 DOI: 10.1101/2025.01.12.632663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Protein structure prediction methods require stoichiometry information (i.e., subunit counts) to predict the quaternary structure of protein complexes. However, this information is often unavailable, making stoichiometry prediction crucial for complexes with unknown stoichiometry. Despite its importance, few computational methods address this challenge. In this study, we present an approach that integrates AlphaFold3 structure predictions with homologous template data to predict stoichiometry. The method generates candidate stoichiometries, builds structural models for them using AlphaFold3, ranks them based on AlphaFold3 scores, and further refine predictions with template-based information when available. In the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our method achieved 71.4% top-1 accuracy and 92.9% top-3 accuracy, outperforming other predictors in terms of the overall performance. This demonstrates the complementary strengths of AlphaFold3- and template-based predictions and highlights its applicability for uncharacterized protein complexes lacking stoichiometry data.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
6
|
Choresca CH, Legario FS, De Leon MEE, Santos MNM, Bumanlag BE, Ca-as CGP, Gente AA, Gloria PCT. Draft genome of Streptococcus agalactiae serotype Ia FBC260 causing hemorrhagic septicemia with massive cellular meningitis in cultured Nile tilapia from the Philippines. Microbiol Resour Announc 2025; 14:e0091124. [PMID: 39651875 PMCID: PMC11737160 DOI: 10.1128/mra.00911-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 10/21/2024] [Indexed: 01/18/2025] Open
Abstract
We report the draft genome of Streptococcus agalactiae serotype Ia strain FBC260, associated with hemorrhagic septicemia and massive cellular meningitis in cultured Nile tilapia from the Philippines. This genomic resource of S. agalactiae from the Philippines will provide valuable insights for disease management.
Collapse
Affiliation(s)
- Casiano H. Choresca
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Francis S. Legario
- Natural Sciences Department, College of Arts and Sciences, Iloilo Science and Technology University, Iloilo City, Iloilo, Philippines
| | - Ma. Elaine E. De Leon
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Mary Nia M. Santos
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Benjamin E. Bumanlag
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Christine Grace P. Ca-as
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Angelie A. Gente
- Department of Agriculture, National Fisheries Research and Development Institute-Fisheries Biotechnology Center, Science City of Muñoz, Nueva Ecija, Philippines
| | - Paul Christian T. Gloria
- Natural Sciences Research Institute, University of the Philippines, Quezon City, Metro Manila, Philippines
| |
Collapse
|
7
|
Wang R, Schlick T. How Large is the Universe of RNA-Like Motifs? A Clustering Analysis of RNA Graph Motifs Using Topological Descriptors. ARXIV 2025:arXiv:2501.04258v1. [PMID: 39867422 PMCID: PMC11760235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Identifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% (121 dual graphs) correspond to approximately 200,000 known RNA atomic fragments/substructures (collected in 2021) using the RNA-as-Graphs (RAG) mapping method. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters. The cluster with the higher percentage of known dual graphs for RNA is defined as the "RNA-like" cluster, while the other is considered as "non-RNA-like". The distance of each dual graph to the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Among the top 15 graphs identified as high-likelihood candidates for novel RNA motifs, 4 were confirmed from the RNA dataset collected in 2022. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs (subgraphs preserve pseudoknots and junctions). Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features (e.g. Betti-0 and Betti-1 numbers). These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.
Collapse
Affiliation(s)
- Rui Wang
- Simons Center for Computational Physical Chemistry, New York University, New York, NY 10003, USA
| | - Tamar Schlick
- Simons Center for Computational Physical Chemistry, New York University, New York, NY 10003, USA
- Department of Chemistry, New York University, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200122, China
| |
Collapse
|
8
|
Bulcha B, Tesfaye A, Garoma A, Begna F. Seroprevalence of and Associated Risk Factors for Bovine Viral Diarrhea in Dairy Cattle in and Around Nekemte Town, East Wallaga, Oromiya Regional State, Ethiopia. BIOMED RESEARCH INTERNATIONAL 2025; 2025:1709145. [PMID: 39817271 PMCID: PMC11729507 DOI: 10.1155/bmri/1709145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 12/10/2024] [Indexed: 01/18/2025]
Abstract
Bovine viral diarrhea virus (BVDV) is an important pathogen affecting dairy cattle all over the world by causing significant economic losses due to reproductive and respiratory problems, immunosuppressive effects, increased risk of morbidity, and calf mortality. A cross-sectional study was conducted from February 2021 to August 2021 to determine the seroprevalence of bovine viral diarrhea (BVD) and identify risk factors associated with its occurrence in and around Nekemte Town of Ethiopia. Blood samples were collected from 305 dairy cattle of 41 herds by using cluster-sampling method. All sampled animals were identified by their age, breeds, origin, parity, pregnancy status, and history of reproductive and respiratory problems. Competitive ELISA was used in the laboratory to detect the presence of antibodies in the serum. At the animal and herd level, descriptive statistics were utilized to assess the amount of BVDV viral antibody circulation, and multivariable logistic regression analysis was employed to detect potential risk variables. The result demonstrates 9.84% (95% confidence interval (CI): 6.49-13.18) and 28.52% (95% CI: 23.46-33.59) seroprevalence of BVDV antibody at individual and herd level, respectively. Abortion (odds ratio (OR) = 2.75; p = 0.019), retention of fetal membrane (OR = 3.33; p = 0.011), purchasing of animals (OR = 2.98; p = 0.017), and pregnancy (OR = 3.16; p = 0.019) were variables significantly associated with the seropositivity of BVDV. Herd size was found to be substantially linked with BVDV infection at the herd level (p = 0.009). These moderate seroprevalence of BVDV results indicate that the virus is widely spread among dairy cattle at various farms in and around Nekemte Town, hurting dairy farm production and productivity. To reduce the seroprevalence of this infectious agent, cows with a history of reproductive disorders should be tested, and new animals should be quarantined before being introduced into herds, and more research should be done to assess the impact of reproductive failure and other effects associated with this virus.
Collapse
Affiliation(s)
- Begna Bulcha
- Department of CLiS, School of Veterinary Medicine, Wallaga University, Nekemte, Ethiopia
| | | | | | - Feyisa Begna
- College of Agriculture and Veterinary Medicine, Jimma University, Jimma, Ethiopia
| |
Collapse
|
9
|
Zhang P, Guo R, Ma S, Jiang H, Yan Q, Li S, Wang K, Deng J, Zhang Y, Zhang Y, Wang G, Chen L, Li L, Guo X, Zhao G, Yang L, Wang Y, Kang J, Sha S, Fan S, Cheng L, Meng J, Yu H, Chen F, He D, Wang J, Liu S, Shi H. A metagenome-wide study of the gut virome in chronic kidney disease. Theranostics 2025; 15:1642-1661. [PMID: 39897560 PMCID: PMC11780533 DOI: 10.7150/thno.101601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 11/29/2024] [Indexed: 02/04/2025] Open
Abstract
Rationale: Chronic kidney disease (CKD) is a progressively debilitating condition leading to kidney dysfunction and severe complications. While dysbiosis of the gut bacteriome has been linked to CKD, the alteration in the gut viral community and its role in CKD remain poorly understood. Methods: Here, we characterize the gut virome in CKD using metagenome-wide analyses of faecal samples from 425 patients and 290 healthy individuals. Results: CKD is associated with a remarkable shift in the gut viral profile that occurs regardless of host properties, disease stage, and underlying diseases. We identify 4,649 differentially abundant viral operational taxonomic units (vOTUs) and reveal that some CKD-enriched viruses are closely related to gut bacterial taxa such as Bacteroides, [Ruminococcus], Erysipelatoclostridium, and Enterocloster spp. In contrast, CKD-depleted viruses include more crAss-like viruses and often target Faecalibacterium, Ruminococcus, and Prevotella species. Functional annotation of the vOTUs reveals numerous viral functional signatures associated with CKD, notably a marked reduction in nicotinamide adenine dinucleotide (NAD+) synthesis capacity within the CKD-associated virome. Furthermore, most CKD viral signatures are reproducible in the gut viromes of diabetic kidney disease and several other common diseases, highlighting the considerable universality of disease-associated viromes. Conclusions: This research provides comprehensive resources and novel insights into the CKD-associated gut virome, offering valuable guidance for future mechanistic and therapeutic investigations.
Collapse
Affiliation(s)
- Pan Zhang
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Ruochun Guo
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
- Puensum Genetech Institute, Wuhan 430076, China
| | - Shiyang Ma
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Hongli Jiang
- Department of Critical Care Nephrology and Blood Purification, the First Affiliated Hospital of Xi'an Jiaotong University, Shaanxi, 710061, China
| | - Qiulong Yan
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Shenghui Li
- Puensum Genetech Institute, Wuhan 430076, China
| | - Kairuo Wang
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Jiang Deng
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Yanli Zhang
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Yue Zhang
- Puensum Genetech Institute, Wuhan 430076, China
| | - Guangyang Wang
- Department of Nephrology, Dalian Municipal Central Hospital affiliated with Dalian University of Technology, Dalian Key Laboratory of Intelligent Blood Purification, Dalian 116033, China
| | - Lei Chen
- Department of Critical Care Nephrology and Blood Purification, the First Affiliated Hospital of Xi'an Jiaotong University, Shaanxi, 710061, China
| | - Lu Li
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Xiaoyan Guo
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Gang Zhao
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Longbao Yang
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Yan Wang
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Jian Kang
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Shanshan Sha
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Shao Fan
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Lin Cheng
- College of Basic Medical Sciences, Dalian Medical University, Dalian 116044, China
| | - Jinxin Meng
- Puensum Genetech Institute, Wuhan 430076, China
| | - Hailong Yu
- Puensum Genetech Institute, Wuhan 430076, China
| | - Fenrong Chen
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Danni He
- Department of Urology, Affiliated Zhongshan Hospital of Dalian University, Dalian 116001, China
| | - Jinhai Wang
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| | - Shuxin Liu
- Department of Nephrology, Dalian Municipal Central Hospital affiliated with Dalian University of Technology, Dalian Key Laboratory of Intelligent Blood Purification, Dalian 116033, China
| | - Haitao Shi
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University; Shaanxi Key Laboratory of Gastrointestinal Motility Disorders; Shaanxi Provincial Clinical Research Center for Gastrointestinal Diseases; Digestive Disease Quality Control Center of Shaanxi Province, Xi'an 710004, China
| |
Collapse
|
10
|
Li R, Yi H, Ma S. A Selective Review of Network Analysis Methods for Gene Expression Data. Methods Mol Biol 2025; 2880:293-307. [PMID: 39900765 DOI: 10.1007/978-1-0716-4276-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
With the development of high-throughput profiling techniques, gene expressions have drawn significant attention due to their important biological implications, widespread data availability, and promising biological findings. The complex interactions and regulations among genes naturally lead to a network structure, which can provide a global view of molecular mechanisms and biological processes. This chapter provides a selective overview of constructing gene expression networks and utilizing them in downstream analysis. It also includes a demonstrating example.
Collapse
Affiliation(s)
- Rong Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Huangdi Yi
- Servier Pharmaceuticals, Boston, MA, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
11
|
Xia J, Su Z, Cai C, Liu T, Yuan Z, Zheng M. Enrichment and identification of a moderately acidophilic nitrite-oxidizing bacterium. WATER RESEARCH X 2025; 26:100308. [PMID: 39967964 PMCID: PMC11833616 DOI: 10.1016/j.wroa.2025.100308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 01/04/2025] [Accepted: 01/24/2025] [Indexed: 02/20/2025]
Abstract
This study enriched a novel nitrite-oxidizing bacterium (NOB, 'Candidatus Nitrobacter acidophilus') in a laboratory reactor operating at pH 4.5 for treating low-strength ammonia wastewater. Batch experiments showed that 'Ca. N. acidophilus' oxidized nitrite to nitrate at a rate of 20.7 ± 2.3 μM/h with optimal growth at pH 5, distinguishing it from most previously known NOB strains. Phylogenetic analysis showed that this Nitrobacter strain clustered with other Nitrobacter strains obtained from acidic environments but was divergent from each other with an average nucleotide identity (ANI) below 85 %. Genomic characteristics revealed that 'Ca. N. acidophilus' possesses versatile transporter systems. They are different from previously reported Nitrobacter strains and indicate acid adaptation mechanisms. Interestingly, the mutualistic interaction with acidophilic ammonia-oxidizing archaea (AOA) Nitrosotalea markedly increased the archaeal amoA gene expression by 149 times and enhanced ammonia oxidation rates by 5 times, highlighting the NOB's role in alleviating nitrite inhibition on the acidophilic AOA. These findings expand our understanding of bacterial nitrite oxidation and provide valuable insights into an important partnership between acidophilic AOA and NOB in acidic environments.
Collapse
Affiliation(s)
- Jun Xia
- CAS Key Laboratory of Urban Pollutant Conversion, Department of Environmental Science and Engineering, University of Science and Technology of China, Hefei 230026, PR China
| | - Zicheng Su
- Australian Centre for Water and Environmental Biotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Chen Cai
- CAS Key Laboratory of Urban Pollutant Conversion, Department of Environmental Science and Engineering, University of Science and Technology of China, Hefei 230026, PR China
| | - Tao Liu
- Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hong Kong, PR China
| | - Zhiguo Yuan
- School of Energy and Environment, City University of Hong Kong, Hong Kong, PR China
| | - Min Zheng
- Water Research Centre, School of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
12
|
Ahmad S, Wu T, Arnold M, Hankemeier T, Ghanbari M, Roshchupkin G, Uitterlinden AG, Neitzel J, Kraaij R, Van Duijn CM, Arfan Ikram M, Kaddurah-Daouk R, Kastenmüller G. The blood metabolome of cognitive function and brain health in middle-aged adults - influences of genes, gut microbiome, and exposome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.12.16.24317793. [PMID: 39763567 PMCID: PMC11702749 DOI: 10.1101/2024.12.16.24317793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Increasing evidence suggests the involvement of metabolic alterations in neurological disorders, including Alzheimer's disease (AD), and highlights the significance of the peripheral metabolome, influenced by genetic factors and modifiable environmental exposures, for brain health. In this study, we examined 1,387 metabolites in plasma samples from 1,082 dementia-free middle-aged participants of the population-based Rotterdam Study. We assessed the relation of metabolites with general cognition (G-factor) and magnetic resonance imaging (MRI) markers using linear regression and estimated the variance of these metabolites explained by genes, gut microbiome, lifestyle factors, common clinical comorbidities, and medication using gradient boosting decision tree analysis. Twenty-one metabolites and one metabolite were significantly associated with total brain volume and total white matter lesions, respectively. Fourteen metabolites showed significant associations with G-factor, with ergothioneine exhibiting the largest effect (adjusted mean difference = 0.122, P = 4.65×10-7). Associations for nine of the 14 metabolites were replicated in an independent, older cohort. The metabolite signature of incident AD in the replication cohort resembled that of cognition in the discovery cohort, emphasizing the potential relevance of the identified metabolites to disease pathogenesis. Lifestyle, clinical variables, and medication were most important in determining these metabolites' blood levels, with lifestyle, explaining up to 28.6% of the variance. Smoking was associated with ten metabolites linked to G-factor, while diabetes and antidiabetic medication were associated with 13 metabolites linked to MRI markers, including N-lactoyltyrosine. Antacid medication strongly affected ergothioneine levels. Mediation analysis revealed that lower ergothioneine levels may partially mediate negative effects of antacids on cognition (31.5%). Gut microbial factors were more important for the blood levels of metabolites that were more strongly associated with cognition and incident AD in the older replication cohort (beta-cryptoxanthin, imidazole propionate), suggesting they may be involved later in the disease process. The detailed results on how multiple modifiable factors affect blood levels of cognition- and brain imaging-related metabolites in dementia-free participants may help identify new AD prevention strategies.
Collapse
Affiliation(s)
- Shahzad Ahmad
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Tong Wu
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Matthias Arnold
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
| | - Thomas Hankemeier
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Mohsen Ghanbari
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
| | - Gennady Roshchupkin
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
| | - André G. Uitterlinden
- Department of Internal Medicine, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
| | - Julia Neitzel
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center, Rotterdam, the Netherlands
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Robert Kraaij
- Department of Internal Medicine, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
| | - Cornelia M. Van Duijn
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
- Nuffield Department of Population Health, Oxford University, Oxford, UK
| | - M. Arfan Ikram
- Department of Epidemiology, Erasmus MC, University Medical Centre, Rotterdam, The Netherlands
| | - Rima Kaddurah-Daouk
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
- Duke Institute of Brain Sciences, Duke University, Durham, NC, USA
- Department of Medicine, Duke University, Durham, NC, USA
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | | |
Collapse
|
13
|
Sharma S, Woodworth B, Yang B, Duan N, Pheko M, Moutsopoulos N, Emiola A. Quantitative mapping of pseudouridines in bacteria RNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.26.625507. [PMID: 39651277 PMCID: PMC11623569 DOI: 10.1101/2024.11.26.625507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
RNA pseudouridylation is one of the most prevalent post-transcriptional modifications, occurring universally across all organisms. Although pseudouridines have been extensively studied in bacterial tRNAs and rRNAs, their presence and role in bacterial mRNA remain poorly characterized. Here, we used a bisulfite-based sequencing approach to provide a comprehensive and quantitative measurement of bacteria pseudouridines. As a proof of concept in E. coli, we identified 1,954 high-confidence sites in 1,331 transcripts, covering almost 30% of the transcriptome. Furthermore, pseudouridine mapping enabled the detection of differentially expressed genes associated with stress response that were unidentified using conventional RNA-seq approach. We also demonstrate that in addition to pseudouridine profiling, our approach can facilitate the discovery of previously unidentified transcripts. As an example, we identified a small RNA transcribed from the antisense strand of tRNA-Tyr which represses expression of distal genes. Finally, we mapped pseudouridines in oral microbiome samples of human subjects, demonstrating the broad applicability of our approach in complex microbiomes. Altogether, our work highlights the advantages of mapping bacterial pseudouridines and provides a tool to study posttranscription regulation in microbial communities.
Collapse
|
14
|
Wu DG, Harris CR, Kalis KM, Bowen M, Biddle JF, Farag IF. Comparative metagenomics of tropical reef fishes show conserved core gut functions across hosts and diets with diet-related functional gene enrichments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.595191. [PMID: 38826274 PMCID: PMC11142082 DOI: 10.1101/2024.05.21.595191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Fish gut microbial communities are important for the breakdown and energy harvesting of the host diet. Microbes within the fish gut are selected by environmental and evolutionary factors. To understand how fish gut microbial communities are shaped by diet, three tropical fish species (hawkfish, Paracirrhites arcatus; yellow tang, Zebrasoma flavescens; and triggerfish, Rhinecanthus aculeatus) were fed piscivorous (fish meal pellets), herbivorous (seaweed), and invertivorous (shrimp) diets, respectively. From fecal samples, a total of 43 metagenome assembled genomes (MAGs) were recovered from all fish diet treatments. Each host-diet treatment harbored distinct microbial communities based on taxonomy, with Proteobacteria, Bacteroidota, and Firmicutes being the most represented. Based on their metagenomes, MAGs from all three host-diet treatments demonstrated a baseline ability to degrade proteinaceous, fatty acid, and simple carbohydrate inputs and carry out central carbon metabolism, lactate and formate fermentation, acetogenesis, nitrate respiration, and B vitamin synthesis. The herbivorous yellow tang harbored more functionally diverse MAGs with some complex polysaccharide degradation specialists, while the piscivorous hawkfish's MAGs were more specialized for the degradation of proteins. The invertivorous triggerfish's gut MAGs lacked many carbohydrate degrading capabilities, resulting in them being more specialized and functionally uniform. Across all treatments, several MAGs were able to participate in only individual steps of the degradation of complex polysaccharides, suggestive of microbial community networks that degrade complex inputs.
Collapse
Affiliation(s)
- Derek G. Wu
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| | - Cassandra R. Harris
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| | - Katie M. Kalis
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| | - Malique Bowen
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| | - Jennifer F. Biddle
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| | - Ibrahim F. Farag
- School of Marine Science and Policy, University of Delaware, Lewes DE 19958 USA
| |
Collapse
|
15
|
Fanfani V, Shutta KH, Mandros P, Fischer J, Saha E, Micheletti S, Chen C, Guebila MB, Lopes-Ramos CM, Quackenbush J. Reproducible processing of TCGA regulatory networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.622163. [PMID: 39574772 PMCID: PMC11580957 DOI: 10.1101/2024.11.05.622163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Background Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline. Findings We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed. Conclusions tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.
Collapse
Affiliation(s)
- Viola Fanfani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine H. Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Panagiotis Mandros
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jonas Fischer
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Enakshi Saha
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Soel Micheletti
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Chen Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Camila M. Lopes-Ramos
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| |
Collapse
|
16
|
Chen H, Murphy RF. CytoSpatio: Learning cell type spatial relationships using multirange, multitype point process models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.31.621408. [PMID: 39553984 PMCID: PMC11565948 DOI: 10.1101/2024.10.31.621408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Recent advances in multiplexed fluorescence imaging have provided new opportunities for deciphering the complex spatial relationships among various cell types across diverse tissues. We introduce CytoSpatio, open-source software that constructs generative, multirange, and multitype point process models that capture interactions among multiple cell types at various distances simultaneously. On analyzing five cell types across five tissues, our software showed consistent spatial relationships within the same tissue type, with certain cell types like proliferating T cells consistently clustering across tissue types. It also revealed that the attraction-repulsion relationships between cell types like B cells and CD4-positive T cells vary with tissue type. CytoSpatio can also generate synthetic tissue structures that preserve the spatial relationships seen in training images, a capability not provided by previous descriptive, motif-based approaches. This potentially allows spatially realistic simulations of how cell relationships affect tissue biochemistry.
Collapse
Affiliation(s)
- Haoran Chen
- Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Robert F. Murphy
- Computational Biology Department, School of Computer Science, Carnegie Mellon University
| |
Collapse
|
17
|
Shen Y, Yan Z, Kingsford C. Data-driven AI system for learning how to run transcript assemblers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.25.577290. [PMID: 39554123 PMCID: PMC11565938 DOI: 10.1101/2024.01.25.577290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
We introduce AutoTuneX, a data-driven, AI system designed to automatically predict optimal parameters for transcript assemblers - tools for reconstructing expressed transcripts from the reads in a given RNA-seq sample. AutoTuneX is built by learning parameter knowledge from existing RNA-seq samples and transferring this knowledge to unseen samples. On 1588 human RNA-seq samples tested with two transcript assemblers, AutoTuneX predicts parameters that resulted in 98% of samples achieving more accurate transcript assembly compared to using default parameter settings, with some samples experiencing up to a 600% improvement in AUC. AutoTuneX offers a new strategy for automatically optimizing use of sequence analysis tools.
Collapse
|
18
|
Ahmad S, Muurinen M, Loid P, Ali MZ, Muzammal M, Fatima S, Khan J, Khan MA, Mäkitie O. A clinical and molecular characterization of a Pakistani family with multicentric osteolysis, nodulosis and arthropathy (MONA) syndrome. Bone Rep 2024; 22:101789. [PMID: 39540058 PMCID: PMC11558256 DOI: 10.1016/j.bonr.2024.101789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/10/2024] [Accepted: 07/12/2024] [Indexed: 11/16/2024] Open
Abstract
Multicentric osteolysis nodulosis and arthropathy (MONA) is a rare skeletal dysplasia characterized primarily by progressive osteolysis, particularly affecting the carpal and tarsal bones, accompanied by osteoporosis. In addition, it features subcutaneous nodules on the palms and soles, along with the progressive onset of arthropathy, encompassing joint contractures, pain, swelling and stiffness. It is caused by a deficiency of the Matrix Metalloproteinase-2 (MMP2). In the current study we present a comprehensive clinical, radiological, genetic and in silico analysis of MONA in a consanguineous Pakistani family. Clinical and radiological examinations of the three severely affected siblings demonstrated a progressive MONA syndrome with phenotypic variability. The patients presented unusual facial appearance, thickened skin, severe short stature, short hands and feet. Radiographs revealed extensive bone deformities affecting upper and lower arms, legs, vertebrae and hip. Genetic analysis revealed a homozygous missense variant [c.539 A > T p.(Asp180Val)] in the MMP2 gene. In silico findings suggested a mutant MMP2 protein with a decreased stability and an altered pattern of interactions. Our findings add to the existing literature on the skeletal phenotype of MONA syndrome, including the specific clinical and radiological patterns observed. Moreover, the study will aid in genetic counseling and accurate diagnosis of families affected by the same disorder within the Pakistani population.
Collapse
Affiliation(s)
- Safeer Ahmad
- Gomal Center of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Pakistan
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Mari Muurinen
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Children's Hospital, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Folkhälsan Research Center, Genetics Research Program, Helsinki, Finland
| | - Petra Loid
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Children's Hospital, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Folkhälsan Research Center, Genetics Research Program, Helsinki, Finland
| | - Muhammad Zeeshan Ali
- Gomal Center of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Pakistan
| | - Muhammad Muzammal
- Gomal Center of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Pakistan
| | - Sana Fatima
- Gomal Center of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Pakistan
| | - Jabbar Khan
- Institute of Biological Sciences, Gomal University, Dera Ismail Khan, Pakistan
| | - Muzammil Ahmad Khan
- Gomal Center of Biochemistry and Biotechnology, Gomal University, Dera Ismail Khan, Pakistan
| | - Outi Mäkitie
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Children's Hospital, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Folkhälsan Research Center, Genetics Research Program, Helsinki, Finland
| |
Collapse
|
19
|
Laperriere SM, Minch B, Weissman JL, Hou S, Yeh YC, Ignacio-Espinoza JC, Ahlgren NA, Moniruzzaman M, Fuhrman JA. Phylogenetic proximity drives temporal succession of marine giant viruses in a five-year metagenomic time-series. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.12.607631. [PMID: 39185240 PMCID: PMC11343133 DOI: 10.1101/2024.08.12.607631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Nucleocytoplasmic Large DNA Viruses (NCLDVs, also called giant viruses) are widespread in marine systems and infect a broad range of microbial eukaryotes (protists). Recent biogeographic work has provided global snapshots of NCLDV diversity and community composition across the world's oceans, yet little information exists about the guiding 'rules' underpinning their community dynamics over time. We leveraged a five-year monthly metagenomic time-series to quantify the community composition of NCLDVs off the coast of Southern California and characterize these populations' temporal dynamics. NCLDVs were dominated by Algavirales (Phycodnaviruses, 59%) and Imitervirales (Mimiviruses, 36%). We identified clusters of NCLDVs with distinct classes of seasonal and non-seasonal temporal dynamics. Overall, NCLDV population abundances were often highly dynamic with a strong seasonal signal. The Imitervirales group had highest relative abundance in the more oligotrophic late summer and fall, while Algavirales did so in winter. Generally, closely related strains had similar temporal dynamics, suggesting that evolutionary history is a key driver of the temporal niche of marine NCLDVs. However, a few closely-related strains had drastically different seasonal dynamics, suggesting that while phylogenetic proximity often indicates ecological similarity, occasionally phenology can shift rapidly, possibly due to host-switching. Finally, we identified distinct functional content and possible host interactions of two major NCLDV orders-including connections of Imitervirales with primary producers like the diatom Chaetoceros and widespread marine grazers like Paraphysomonas and Spirotrichea ciliates. Together, our results reveal key insights on season-specific effect of phylogenetically distinct giant virus communities on marine protist metabolism, biogeochemical fluxes and carbon cycling.
Collapse
Affiliation(s)
- Sarah M. Laperriere
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Benjamin Minch
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Sciences, University of Miami, Miami, FL, USA
| | - JL Weissman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA
- Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
| | - Shengwei Hou
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yi-Chun Yeh
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | | | | | - Mohammad Moniruzzaman
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Sciences, University of Miami, Miami, FL, USA
| | - Jed A. Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
20
|
Kim DD, Swarthout JM, Worby CJ, Chieng B, Mboya J, Earl AM, Njenga SM, Pickering AJ. Bacterial strain sharing between humans, animals, and the environment among urban households. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.05.24311509. [PMID: 39148836 PMCID: PMC11326342 DOI: 10.1101/2024.08.05.24311509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Identifying bacterial transmission pathways is crucial to inform strategies aimed at curbing the spread of pathogenic and antibiotic-resistant bacteria, especially in rapidly urbanizing low- and middle-income countries. In this study, we assessed bacterial strain-sharing and dissemination of antibiotic resistance across humans, domesticated poultry, canines, household soil, and drinking water in urban informal settlements in Nairobi, Kenya. We collected 321 samples from 50 households and performed Pooling Isolated Colonies-seq (PIC-seq) by sequencing pools of up to five Escherichia coli colonies per sample to capture strain diversity, strain-sharing patterns, and overlap of antibiotic-resistant genes (ARGs). Bacterial strains isolated from the household environment carried clinically relevant ARGs, reinforcing the role of the environment in antibiotic resistance dissemination. Strain-sharing rates and resistome similarities across sample types were strongly correlated within households, suggesting clonal spread of bacteria is a main driver of dissemination of ARGs in the domestic urban environment. Within households, E. coli strain-sharing was rare between humans and animals but more frequent between humans and drinking water. E. coli contamination in stored drinking water was also associated with higher strain-sharing between humans in the same household. Our study demonstrates that contaminated drinking water facilitates human to human strain sharing and water treatment can disrupt transmission.
Collapse
Affiliation(s)
- Daehyun D. Kim
- Department of Civil and Environmental Engineering, University of California, Berkeley, CA, USA
| | - Jenna M. Swarthout
- Department of Civil and Environmental Engineering, Tufts University, Medford, MA, USA
| | - Colin J. Worby
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA, USA
| | | | - John Mboya
- Department of Civil and Environmental Engineering, University of California, Berkeley, CA, USA
| | - Ashlee M. Earl
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA, USA
| | | | - Amy J. Pickering
- Department of Civil and Environmental Engineering, University of California, Berkeley, CA, USA
- Chan Zuckerberg Biohub – San Francisco
- Blum Center for Developing Economies, University of California, Berkeley, Berkeley, CA 94720
| |
Collapse
|
21
|
Schulz NK, Asgari D, Liu S, Birnbaum SS, Williams AM, Prakash A, Tate AT. Resources modulate developmental shifts but not infection tolerance upon coinfection in an insect system. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.01.606236. [PMID: 39149267 PMCID: PMC11326177 DOI: 10.1101/2024.08.01.606236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Energetic resources fuel immune responses and parasite growth within organisms, but it is unclear whether energy allocation is sufficient to explain changes in infection outcomes under the threat of multiple parasites. We manipulated diet in flour beetles (Tribolium confusum) infected with two natural parasites to investigate the role of resources in shifting metabolic and immune responses after single and co-infection. Our results suggest that gregarine parasites alter the within-host energetic environment, and by extension juvenile development time, in a diet-dependent manner. Gregarines do not affect host resistance to acute bacterial infection but do stimulate the expression of an alternative set of immune genes and promote damage to the gut, ultimately contributing to reduced survival regardless of diet. Thus, energy allocation is not sufficient to explain the immunological contribution to coinfection outcomes, emphasizing the importance of mechanistic insight for predicting the impact of coinfection across levels of biological organization.
Collapse
Affiliation(s)
- Nora K.E. Schulz
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
| | - Danial Asgari
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
| | - Siqin Liu
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
| | | | - Alissa M. Williams
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
| | - Arun Prakash
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
| | - Ann T. Tate
- Department of Biological Sciences, Vanderbilt University, Nashville TN 37232
- Evolutionary Studies Initiative, Vanderbilt University, Nashville TN 37232
| |
Collapse
|
22
|
Mendes M, Chen DZ, Engchuan W, Leal TP, Thiruvahindrapuram B, Trost B, Howe JL, Pellecchia G, Nalpathamkalam T, Alexandrova R, Salazar NB, McKee EA, Alfaro NR, Lai MC, Bandres-Ciga S, Roshandel D, Bradley CA, Anagnostou E, Sun L, Scherer SW. Chromosome X-Wide Common Variant Association Study (XWAS) in Autism Spectrum Disorder. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.18.24310640. [PMID: 39108515 PMCID: PMC11302709 DOI: 10.1101/2024.07.18.24310640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
Autism Spectrum Disorder (ASD) displays a notable male bias in prevalence. Research into rare (<0.1) genetic variants on the X chromosome has implicated over 20 genes in ASD pathogenesis, such as MECP2, DDX3X, and DMD. The "female protective effect" in ASD suggests that females may require a higher genetic burden to manifest similar symptoms as males, yet the mechanisms remain unclear. Despite technological advances in genomics, the complexity of the biological nature of sex chromosomes leave them underrepresented in genome-wide studies. Here, we conducted an X chromosome-wide association study (XWAS) using whole-genome sequencing data from 6,873 individuals with ASD (82% males) across Autism Speaks MSSNG, Simons Simplex Cohort SSC, and Simons Foundation Powering Autism Research SPARK, alongside 8,981 population controls (43% males). We analyzed 418,652 X-chromosome variants, identifying 59 associated with ASD (p-values 7.9×10-6 to 1.51×10-5), surpassing Bonferroni-corrected thresholds. Key findings include significant regions on chrXp22.2 (lead SNP=rs12687599, p=3.57×10-7) harboring ASB9/ASB11, and another encompassing DDX53/PTCHD1-AS long non-coding RNA (lead SNP=rs5926125, p=9.47×10-6). When mapping genes within 10kb of the 59 most significantly associated SNPs, 91 genes were found, 17 of which yielded association with ASD (GRPR, AP1S2, DDX53, HDAC8, PCDH19, PTCHD1, PCDH11X, PTCHD1-AS, DMD, SYAP1, CNKSR2, GLRA2, OFD1, CDKL5, GPRASP2, NXF5, SH3KBP1). FGF13 emerged as a novel X-linked ASD candidate gene, highlighted by sex-specific differences in minor allele frequencies. These results reveal significant new insights into X chromosome biology in ASD, confirming and nominating genes and pathways for further investigation.
Collapse
Affiliation(s)
- Marla Mendes
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Desmond Zeya Chen
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, M5G 1X6, Canada
| | - Worrawat Engchuan
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Thiago Peixoto Leal
- Lerner Research Institute, Genomic Medicine, Cleveland Clinic, Cleveland, OH, 44106, USA
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Brett Trost
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Jennifer L. Howe
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Giovanna Pellecchia
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Thomas Nalpathamkalam
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Roumiana Alexandrova
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Nelson Bautista Salazar
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Ethan Alexander McKee
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Natalia Rivera Alfaro
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Meng-Chuan Lai
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5G 2C1, Canada
- Department of Psychiatry, The Hospital for Sick Children, Toronto, ON, M5G 1E8, Canada
- Department of Psychiatry, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Sara Bandres-Ciga
- Center for Alzheimer’s and Related Dementias, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Delnaz Roshandel
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Clarrisa A. Bradley
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Evdokia Anagnostou
- Autism Research Centre, Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON, M4G 1R8, Canada
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Lei Sun
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, ON, M5G 1X6, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5S 3E3, Canada
| | - Stephen W. Scherer
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
- McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| |
Collapse
|
23
|
Liu T, Liu C, Li Q, Zheng X, Zou F. Adaptive Regularized Tri-Factor Non-Negative Matrix Factorization for Cell Type Deconvolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.07.570631. [PMID: 38106220 PMCID: PMC10723472 DOI: 10.1101/2023.12.07.570631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Accurate deconvolution of cell types from bulk gene expression is crucial for understanding cellular compositions and uncovering cell-type specific differential expression and physiological states of diseased tissues. Existing deconvolution methods have limitations, such as requiring complete cellular gene expression signatures or neglecting partial biological information. Moreover, these methods often overlook varying cell-type mRNA amounts, leading to biased proportion estimates. Additionally, they do not effectively utilize valuable reference information from external studies, such as means and ranges of population cell-type proportions. To address these challenges, we introduce an Adaptive Regularized Tri-factor non-negative matrix factorization approach for deconvolution (ARTdeConv). We rigorously establish the numerical convergence of our algorithm. Through benchmark simulations, we demonstrate the superior performance of ARTdeConv compared to state-of-the-art semi-reference-based and reference-free methods. In a real-world application, our method accurately estimates cell proportions, as evidenced by the nearly perfect Pearson's correlation between ARTdeConv estimates and flow cytometry measurements in a dataset from a trivalent influenza vaccine study. Moreover, our analysis of ARTdeConv estimates in COVID-19 patients reveals patterns consistent with important immunological phenomena observed in other studies. The proposed method, ARTdeConv, is implemented as an R package and can be accessed on GitHub for researchers and practitioners.
Collapse
|
24
|
Tarafder S, Bhattacharya D. lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.04.565599. [PMID: 37961488 PMCID: PMC10635153 DOI: 10.1101/2023.11.04.565599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently-available machine learning-based approaches. Here we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root mean square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | | |
Collapse
|
25
|
Xu S, Ackerman ME. Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets. BMC Bioinformatics 2024; 25:218. [PMID: 38898392 PMCID: PMC11186207 DOI: 10.1186/s12859-024-05834-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 06/10/2024] [Indexed: 06/21/2024] Open
Abstract
BACKGROUND Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality. RESULTS In this study, we report a novel methodology combining multiple established PU learning-based strategies with permutation testing to evaluate the potential of KP samples to accurately classify unlabeled samples without using "ground truth" positive and negative labels for validation. Multivariate synthetic and real-world high-dimensional benchmark datasets were employed to demonstrate the suitability of the proposed pipeline to provide evidence of model robustness across varied underlying ground truth class label compositions among the unlabeled set and with different proportions of KP examples. Comparisons between model performance with actual and permuted labels could be used to distinguish reliable from unreliable models. CONCLUSIONS As in fully supervised machine learning, permutation testing offers a means to set a baseline "no-information rate" benchmark in the context of semi-supervised PU learning inference tasks-providing a standard against which model performance can be compared.
Collapse
Affiliation(s)
- Shiwei Xu
- Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA
| | - Margaret E Ackerman
- Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA.
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH, USA.
- Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, NH, 03755, USA.
| |
Collapse
|
26
|
Zou J, Li Z, Carleton N, Oesterreich S, Lee AV, Tseng GC. Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.11.598484. [PMID: 38915481 PMCID: PMC11195192 DOI: 10.1101/2024.06.11.598484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Motivation Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types). Results We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets. Availability https://github.com/jianzou75/MICA.
Collapse
Affiliation(s)
- Jian Zou
- Department of Statistics, School of Public Health, Chongqing Medical University, Chongqing, 400016, Chongqing, China
| | - Zheqi Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, 02215, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, 02215, Massachusetts, USA
| | - Neil Carleton
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - Steffi Oesterreich
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - Adrian V. Lee
- Women’s Cancer Research Center, UPMC Hillman Cancer Center (HCC), Pittsburgh, 15232, Pennsylvania, USA
- Magee-Womens Research Institute, Pittsburgh, 15213, Pennsylvania, USA
- Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, 15213, Pennsylvania, USA
| |
Collapse
|
27
|
Liu Z, Vucetich S, DeToy K, Duran Saucedo G, Verastegui M, Carballo-Jimenez P, Mercado-Saavedra BN, Tinajeros F, Malaga-Machaca ES, Marcus R, Gilman RH, Bowman NM, McCall LI. Small molecule biomarkers predictive of Chagas disease progression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.13.24307310. [PMID: 38798659 PMCID: PMC11118624 DOI: 10.1101/2024.05.13.24307310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Chagas disease (CD) is a neglected tropical disease caused by the parasitic protozoan Trypanosoma cruzi. However, only 20% to 30% of infected individuals will progress to severe symptomatic cardiac manifestations. Current treatments are benznidazole and nifurtimox, which are poorly tolerated regimens. Developing a biomarker to determine the likelihood of patient progression would be helpful for doctors to optimize patient treatment strategies. Such a biomarker would also benefit drug discovery efforts and clinical trials. In this study, we combined untargeted and targeted metabolomics to compare serum samples from T. cruzi-infected individuals who progressed to severe cardiac disease, versus infected individuals who remained at the same disease stage (non-progressors). We identified four unannotated biomarker candidates, which were validated in an independent cohort using both untargeted and targeted analysis techniques. Overall, our findings demonstrate that serum small molecules can predict CD progression, offering potential for clinical monitoring.
Collapse
|
28
|
Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024; 3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| |
Collapse
|
29
|
Culver RN, Spencer SP, Violette A, Lemus Silva EG, Takeuchi T, Nafarzadegan C, Higginbottom SK, Shalon D, Sonnenburg J, Huang KC. Improved mouse models of the small intestine microbiota using region-specific sampling from humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.24.590999. [PMID: 38712253 PMCID: PMC11071525 DOI: 10.1101/2024.04.24.590999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Our understanding of region-specific microbial function within the gut is limited due to reliance on stool. Using a recently developed capsule device, we exploit regional sampling from the human intestines to develop models for interrogating small intestine (SI) microbiota composition and function. In vitro culturing of human intestinal contents produced stable, representative communities that robustly colonize the SI of germ-free mice. During mouse colonization, the combination of SI and stool microbes altered gut microbiota composition, functional capacity, and response to diet, resulting in increased diversity and reproducibility of SI colonization relative to stool microbes alone. Using a diverse strain library representative of the human SI microbiota, we constructed defined communities with taxa that largely exhibited the expected regional preferences. Response to a fiber-deficient diet was region-specific and reflected strain-specific fiber-processing and host mucus-degrading capabilities, suggesting that dietary fiber is critical for maintaining SI microbiota homeostasis. These tools should advance mechanistic modeling of the human SI microbiota and its role in disease and dietary responses.
Collapse
Affiliation(s)
- Rebecca N. Culver
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sean Paul Spencer
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Arvie Violette
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Evelyn Giselle Lemus Silva
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tadashi Takeuchi
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ceena Nafarzadegan
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Steven K. Higginbottom
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dari Shalon
- Envivo Bio, Inc., San Francisco, CA 94107, USA
| | - Justin Sonnenburg
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| | - Kerwyn Casey Huang
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| |
Collapse
|
30
|
Nixon MP, Gloor GB, Silverman JD. Beyond Normalization: Incorporating Scale Uncertainty in Microbiome and Gene Expression Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.01.587602. [PMID: 38617212 PMCID: PMC11014594 DOI: 10.1101/2024.04.01.587602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Though statistical normalizations are often used in differential abundance or differential expression analysis to address sample-to-sample variation in sequencing depth, we offer a better alternative. These normalizations often make strong, implicit assumptions about the scale of biological systems (e.g., microbial load). Thus, analyses are susceptible to even slight errors in these assumptions, leading to elevated rates of false positives and false negatives. We introduce scale models as a generalization of normalizations so researchers can model potential errors in assumptions about scale. By incorporating scale models into the popular ALDEx2 software, we enhance the reproducibility of analyses while often drastically decreasing false positive and false negative rates. We design scale models that are guaranteed to reduce false positives compared to equivalent normalizations. At least in the context of ALDEx2, we recommend using scale models over normalizations in all practical situations.
Collapse
Affiliation(s)
- Michelle Pistner Nixon
- College of Information Science and Technology, Pennsylvania State University, University Park, PA, USA
| | - Gregory B. Gloor
- Department of Biochemistry, The University of Western Ontario, London, ON, CAN
| | - Justin D. Silverman
- College of Information Science and Technology, Pennsylvania State University, University Park, PA, USA
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
- Department of Medicine, Pennsylvania State University, Hershey, PA, USA
| |
Collapse
|
31
|
Beard JW, Hunt SL, Evans A, Goenner C, Miller BL. Mimicking an in cellulo environment for enzyme-free paper-based nucleic acid tests at the point of care. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582375. [PMID: 38464301 PMCID: PMC10925243 DOI: 10.1101/2024.02.27.582375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Point of care (PoC) nucleic acid amplification tests (NAATs) are a cornerstone of public health, providing the earliest and most accurate diagnostic method for many communicable diseases, such as HIV, in the same location the patient receives treatment. Communicable diseases disproportionately impact low-resource communities where NAATs are often unobtainable due to the resource intensive enzymes that drive the tests. Enzyme-free nucleic acid detection methods, such as hybridization chain reaction (HCR), use DNA secondary structures for self-driven amplification schemes producing large DNA nanostructures and capable of single molecule detection in cellulo. These thermodynamically driven DNA-based tests have struggled to penetrate the PoC diagnostic field due to their inadequate limits of detection or complex workflows. Here we present a proof-of-concept NAAT that combines HCR-based amplification of a target nucleic acid sequence with paper-based nucleic acid filtration and enrichment capable of detecting sub pM levels of synthetic DNA. We reconstruct the favorable hybridization conditions of an in cellulo reaction in vitro by incubating HCR in an evaporating, microvolume environment containing poly(ethylene glycol) as a crowding agent. We demonstrate that the kinetics and thermodynamics of DNA-DNA and DNA-RNA hybridization is enhanced by the dynamic evaporating environment and inclusion of crowding agents, bringing HCR closer to meeting PoC NAAT needs.
Collapse
Affiliation(s)
- Jeffrey W. Beard
- Department of Dermatology, University of Rochester, Rochester, NY 14627, USA
| | - Samuel L. Hunt
- Department of Dermatology, University of Rochester, Rochester, NY 14627, USA
| | - Alexander Evans
- Department of Biomedical Engineering, University of Rochester, Rochester, NY 14627, USA
| | - Coleman Goenner
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, NY 14627, USA
| | - Benjamin L. Miller
- Department of Dermatology, University of Rochester, Rochester, NY 14627, USA
- Department of Biomedical Engineering, University of Rochester, Rochester, NY 14627, USA
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
32
|
Liang T, Jiang T, Liang Z, Zhang N, Dong B, Wu Q, Gu B. Carbohydrate-active enzyme profiles of Lactiplantibacillus plantarum strain 84-3 contribute to flavor formation in fermented dairy and vegetable products. Food Chem X 2023; 20:101036. [PMID: 38059176 PMCID: PMC10696159 DOI: 10.1016/j.fochx.2023.101036] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/26/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023] Open
Abstract
Microbes are critical for flavor formation in fermented foods; however, their mechanisms of action are not fully understood. The microbial composition of 51 dairy and 47 vegetable products was functionally annotated and the carbohydrate-active enzyme (CAZyme) profiles of Lactiplantibacillus plantarum 84-3 (Lp84-3), isolated from dairy samples, can promote resistant starch (RS) degradation, were analyzed. Lactobacillus, Streptococcus, and Lactococcus were the predominant genera in dairy products, whereas the major genera in vegetables were Lactobacillus, Weissella, and Carnimonas. Phages from Siphoviridae, Myoviridae, and Herelleviridae were also present in dairy products. Additionally, the glycosyl hydrolase (GHs) family members GH1 and GH13 and the glycosyltransferase (GTs) family members GT2 and GT4 were abundant in Lp84-3. Moreover, Lp84-3 was enriched in butanoate metabolism enzymes and butanoate metabolite compounds. Therefore, fermented food microbes, especially Lp84-3, have an abundant repertoire of enzymes that promote flavor production, as starter improving the flavor of fermented dairy and vegetable products.
Collapse
Affiliation(s)
- Tingting Liang
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Department of Clinical Laboratory Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
| | - Tong Jiang
- Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
| | - Zhuang Liang
- Department of Rehabilitation Hospital Pain Ward, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710054, China
| | - Ni Zhang
- Department of Clinical Laboratory Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
| | - Bo Dong
- Department of Rehabilitation Hospital Pain Ward, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710054, China
| | - Qingping Wu
- Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China
| | - Bing Gu
- Department of Clinical Laboratory Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
| |
Collapse
|
33
|
Rocks D, Purisic E, Gallo EF, Greally JM, Suzuki M, Kundakovic M. Egr1 is a sex-specific regulator of neuronal chromatin, synaptic plasticity, and behaviour. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572697. [PMID: 38187614 PMCID: PMC10769422 DOI: 10.1101/2023.12.20.572697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Sex differences are found in brain structure and function across species, and across brain disorders in humans1-3. The major source of brain sex differences is differential secretion of steroid hormones from the gonads across the lifespan4. Specifically, ovarian hormones oestrogens and progesterone are known to dynamically change structure and function of the adult female brain, having a major impact on psychiatric risk5-7. However, due to limited molecular studies in female rodents8, very little is still known about molecular drivers of female-specific brain and behavioural plasticity. Here we show that overexpressing Egr1, a candidate oestrous cycle-dependent transcription factor9, induces sex-specific changes in ventral hippocampal neuronal chromatin, gene expression, and synaptic plasticity, along with hippocampus-dependent behaviours. Importantly, Egr1 overexpression mimics the high-oestrogenic phase of the oestrous cycle, and affects behaviours in ovarian hormone-depleted females but not in males. We demonstrate that Egr1 opens neuronal chromatin directly across the sexes, although with limited genomic overlap. Our study not only reveals the first sex-specific chromatin regulator in the brain, but also provides functional evidence that this sex-specific gene regulation drives neuronal gene expression, synaptic plasticity, and anxiety- and depression-related behaviour. Our study exemplifies an innovative sex-based approach to studying neuronal gene regulation1 in order to understand sex-specific synaptic and behavioural plasticity and inform novel brain disease treatments.
Collapse
Affiliation(s)
- Devin Rocks
- Department of Biological Sciences, Fordham University, Bronx, NY, USA
| | - Eric Purisic
- Department of Biological Sciences, Fordham University, Bronx, NY, USA
| | - Eduardo F. Gallo
- Department of Biological Sciences, Fordham University, Bronx, NY, USA
| | - John M. Greally
- Center for Epigenomics, Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Masako Suzuki
- Center for Epigenomics, Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Department of Nutrition, Texas A&M University, College Station, TX, USA
| | - Marija Kundakovic
- Department of Biological Sciences, Fordham University, Bronx, NY, USA
| |
Collapse
|
34
|
Basher ARMA, Hallinan C, Lee K. Heterogeneity-Preserving Discriminative Feature Selection for Subtype Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.14.540686. [PMID: 38187596 PMCID: PMC10769187 DOI: 10.1101/2023.05.14.540686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The discovery of subtypes is pivotal for disease diagnosis and targeted therapy, considering the diverse responses of different cells or patients to specific treatments. Exploring the heterogeneity within disease or cell states provides insights into disease progression mechanisms and cell differentiation. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, at large scales. While presenting opportunities for subtype discovery, these datasets pose challenges in finding relevant signatures due to their high dimensionality. Feature selection, a crucial step in the analysis pipeline, involves choosing signatures that reduce the feature size for more efficient downstream computational analysis. Numerous existing methods focus on selecting signatures that differentiate known diseases or cell states, yet they often fall short in identifying features that preserve heterogeneity and reveal subtypes. To identify features that can capture the diversity within each class while also maintaining the discrimination of known disease states, we employed deep metric learning-based feature embedding to conduct a detailed exploration of the statistical properties of features essential in preserving heterogeneity. Our analysis revealed that features with a significant difference in interquartile range (IQR) between classes possess crucial subtype information. Guided by this insight, we developed a robust statistical method, termed PHet (Preserving Heterogeneity) that performs iterative subsampling differential analysis of IQR and Fisher's method between classes, identifying a minimal set of heterogeneity-preserving discriminative features to optimize subtype clustering quality. Validation using public single-cell RNA-seq and microarray datasets showcased PHet's effectiveness in preserving sample heterogeneity while maintaining discrimination of known disease/cell states, surpassing the performance of previous outlier-based methods. Furthermore, analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells revealed, through PHet-based features, the presence of two distinct basal cell subtypes undergoing differentiation toward a luminal secretory phenotype. Notably, one of these subtypes exhibited high expression of BPIFA1. Interestingly, previous studies have linked BPIFA1 secretion to the emergence of secretory cells during mucociliary differentiation of airway epithelial cells. PHet successfully pinpointed the basal cell subtype associated with this phenomenon, a distinction that pre-annotated markers and dispersion-based features failed to make due to their admixed feature expression profiles. These findings underscore the potential of our method to deepen our understanding of the mechanisms underlying diseases and cell differentiation and contribute significantly to personalized medicine.
Collapse
Affiliation(s)
- Abdur Rahman M. A. Basher
- Vascular Biology Program, Boston Children’s Hospital, Boston, MA 02115, USA
- Department of Surgery, Harvard Medical School, Boston, MA 02115, USA
| | - Caleb Hallinan
- Vascular Biology Program, Boston Children’s Hospital, Boston, MA 02115, USA
| | - Kwonmoo Lee
- Vascular Biology Program, Boston Children’s Hospital, Boston, MA 02115, USA
- Department of Surgery, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
35
|
Cook R, Telatin A, Bouras G, Camargo AP, Larralde M, Edwards RA, Adriaenssens EM. Predicting stop codon reassignment improves functional annotation of bacteriophages. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572299. [PMID: 38187747 PMCID: PMC10769273 DOI: 10.1101/2023.12.19.572299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The majority of bacteriophage diversity remains uncharacterised, and new intriguing mechanisms of their biology are being continually described. Members of some phage lineages, such as the Crassvirales, repurpose stop codons to encode an amino acid by using alternate genetic codes. Here, we investigated the prevalence of stop codon reassignment in phage genomes and subsequent impacts on functional annotation. We predicted 76 genomes within INPHARED and 712 vOTUs from the Unified Human Gut Virome catalogue (UHGV) that repurpose a stop codon to encode an amino acid. We re-annotated these sequences with modified versions of Pharokka and Prokka, called Pharokka-gv and Prokka-gv, to automatically predict stop codon reassignment prior to annotation. Both tools significantly improved the quality of annotations, with Pharokka-gv performing best. For sequences predicted to repurpose TAG to glutamine (translation table 15), Pharokka-gv increased the median gene length (median of per genome medians) from 287 to 481 bp for UHGV sequences (67.8% increase) and from 318 to 550 bp for INPHARED sequences (72.9% increase). The re-annotation increased mean coding density from 66.8% to 90.0%, and from 69.0% to 89.8% for UHGV and INPHARED sequences. Furthermore, the proportion of genes that could be assigned functional annotation increased, including an increase in the number of major capsid proteins that could be identified. We propose that automatic prediction of stop codon reassignment before annotation is beneficial to downstream viral genomic and metagenomic analyses.
Collapse
Affiliation(s)
- Ryan Cook
- Food, Microbiome and Health Research Programme, Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | - Andrea Telatin
- Food, Microbiome and Health Research Programme, Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5070, Australia
- Department of Surgery—Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, SA 5070, Australia
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Evelien M. Adriaenssens
- Food, Microbiome and Health Research Programme, Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| |
Collapse
|
36
|
Bohn L, Drouin SM, McFall GP, Rolfson DB, Andrew MK, Dixon RA. Machine learning analyses identify multi-modal frailty factors that selectively discriminate four cohorts in the Alzheimer's disease spectrum: a COMPASS-ND study. BMC Geriatr 2023; 23:837. [PMID: 38082372 PMCID: PMC10714519 DOI: 10.1186/s12877-023-04546-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Frailty indicators can operate in dynamic amalgamations of disease conditions, clinical symptoms, biomarkers, medical signals, cognitive characteristics, and even health beliefs and practices. This study is the first to evaluate which, among these multiple frailty-related indicators, are important and differential predictors of clinical cohorts that represent progression along an Alzheimer's disease (AD) spectrum. We applied machine-learning technology to such indicators in order to identify the leading predictors of three AD spectrum cohorts; viz., subjective cognitive impairment (SCI), mild cognitive impairment (MCI), and AD. The common benchmark was a cohort of cognitively unimpaired (CU) older adults. METHODS The four cohorts were from the cross-sectional Comprehensive Assessment of Neurodegeneration and Dementia dataset. We used random forest analysis (Python 3.7) to simultaneously test the relative importance of 83 multi-modal frailty indicators in discriminating the cohorts. We performed an explainable artificial intelligence method (Tree Shapley Additive exPlanation values) for deep interpretation of prediction effects. RESULTS We observed strong concurrent prediction results, with clusters varying across cohorts. The SCI model demonstrated excellent prediction accuracy (AUC = 0.89). Three leading predictors were poorer quality of life ([QoL]; memory), abnormal lymphocyte count, and abnormal neutrophil count. The MCI model demonstrated a similarly high AUC (0.88). Five leading predictors were poorer QoL (memory, leisure), male sex, abnormal lymphocyte count, and poorer self-rated eyesight. The AD model demonstrated outstanding prediction accuracy (AUC = 0.98). Ten leading predictors were poorer QoL (memory), reduced olfaction, male sex, increased dependence in activities of daily living (n = 6), and poorer visual contrast. CONCLUSIONS Both convergent and cohort-specific frailty factors discriminated the AD spectrum cohorts. Convergence was observed as all cohorts were marked by lower quality of life (memory), supporting recent research and clinical attention to subjective experiences of memory aging and their potentially broad ramifications. Diversity was displayed in that, of the 14 leading predictors extracted across models, 11 were selectively sensitive to one cohort. A morbidity intensity trend was indicated by an increasing number and diversity of predictors corresponding to clinical severity, especially in AD. Knowledge of differential deficit predictors across AD clinical cohorts may promote precision interventions.
Collapse
Affiliation(s)
- Linzy Bohn
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada.
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada.
| | - Shannon M Drouin
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada
| | - G Peggy McFall
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada
| | - Darryl B Rolfson
- Department of Medicine, Division of Geriatric Medicine, University of Alberta, 13-135 Clinical Sciences Building, Edmonton, AB, T6G 2G3, Canada
| | - Melissa K Andrew
- Department of Medicine, Division of Geriatric Medicine, Dalhousie University, 5955 Veterans' Memorial Lane, Halifax, NS, B3H 2E1, Canada
| | - Roger A Dixon
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada
| |
Collapse
|
37
|
Bristy NA, Fu X, Schwartz R. Sc-TUSV-ext: Single-cell clonal lineage inference from single nucleotide variants (SNV), copy number alterations (CNA) and structural variants (SV). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570724. [PMID: 38106049 PMCID: PMC10723466 DOI: 10.1101/2023.12.07.570724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Clonal lineage inference ("tumor phylogenetics") has become a crucial tool for making sense of somatic evolution processes that underlie cancer development and are increasingly recognized as part of normal tissue growth and aging. The inference of clonal lineage trees from single cell sequence data offers particular promise for revealing processes of somatic evolution in unprecedented detail. However, most such tools are based on fairly restrictive models of the types of mutation events observed in somatic evolution and of the processes by which they develop. The present work seeks to enhance the power and versatility of tools for single-cell lineage reconstruction by making more comprehensive use of the range of molecular variant types by which tumors evolve. We introduce Sc-TUSV-ext, an integer linear programming (ILP) based tumor phylogeny reconstruction method that, for the first time, integrates single nucleotide variants (SNV), copy number alterations (CNA) and structural variations (SV) into clonal lineage reconstruction from single-cell DNA sequencing data. We show on synthetic data that accounting for these variant types collectively leads to improved accuracy in clonal lineage reconstruction relative to prior methods that consider only subsets of the variant types. We further demonstrate the effectiveness on real data in resolving clonal evolution in the presence of multiple variant types, providing a path towards more comprehensive insight into how various forms of somatic mutability collectively shape tissue development.
Collapse
|
38
|
Schaible GA, Jay ZJ, Cliff J, Schulz F, Gauvin C, Goudeau D, Malmstrom RR, Emil Ruff S, Edgcomb V, Hatzenpichler R. Multicellular magnetotactic bacterial consortia are metabolically differentiated and not clonal. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568837. [PMID: 38076927 PMCID: PMC10705294 DOI: 10.1101/2023.11.27.568837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Consortia of multicellular magnetotactic bacteria (MMB) are currently the only known example of bacteria without a unicellular stage in their life cycle. Because of their recalcitrance to cultivation, most previous studies of MMB have been limited to microscopic observations. To study the biology of these unique organisms in more detail, we use multiple culture-independent approaches to analyze the genomics and physiology of MMB consortia at single cell resolution. We separately sequenced the metagenomes of 22 individual MMB consortia, representing eight new species, and quantified the genetic diversity within each MMB consortium. This revealed that, counter to conventional views, cells within MMB consortia are not clonal. Single consortia metagenomes were then used to reconstruct the species-specific metabolic potential and infer the physiological capabilities of MMB. To validate genomic predictions, we performed stable isotope probing (SIP) experiments and interrogated MMB consortia using fluorescence in situ hybridization (FISH) combined with nano-scale secondary ion mass spectrometry (NanoSIMS). By coupling FISH with bioorthogonal non-canonical amino acid tagging (BONCAT) we explored their in situ activity as well as variation of protein synthesis within cells. We demonstrate that MMB consortia are mixotrophic sulfate reducers and that they exhibit metabolic differentiation between individual cells, suggesting that MMB consortia are more complex than previously thought. These findings expand our understanding of MMB diversity, ecology, genomics, and physiology, as well as offer insights into the mechanisms underpinning the multicellular nature of their unique lifestyle.
Collapse
Affiliation(s)
- George A. Schaible
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717
| | - Zackary J. Jay
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| | - John Cliff
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Frederik Schulz
- Department of Energy Joint Genome Institute, Berkeley, CA, 94720
| | - Colin Gauvin
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
| | - Danielle Goudeau
- Department of Energy Joint Genome Institute, Berkeley, CA, 94720
| | - Rex R. Malmstrom
- Department of Energy Joint Genome Institute, Berkeley, CA, 94720
| | - S. Emil Ruff
- Ecosystems Center and Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA, 02543
| | | | - Roland Hatzenpichler
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, MT 59717
| |
Collapse
|
39
|
Hopkins BR, Angus-Henry A, Kim BY, Carlisle JA, Thompson A, Kopp A. Decoupled evolution of the Sex Peptide gene family and Sex Peptide Receptor in Drosophilidae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.29.547128. [PMID: 37425821 PMCID: PMC10327216 DOI: 10.1101/2023.06.29.547128] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Across internally fertilising species, males transfer ejaculate proteins that trigger wide-ranging changes in female behaviour and physiology. Much theory has been developed to explore the drivers of ejaculate protein evolution. The accelerating availability of high-quality genomes now allows us to test how these proteins are evolving at fine taxonomic scales. Here, we use genomes from 264 species to chart the evolutionary history of Sex Peptide (SP), a potent regulator of female post-mating responses in Drosophila melanogaster. We infer that SP first evolved in the Drosophilinae subfamily and has followed markedly different evolutionary trajectories in different lineages. Outside of the Sophophora-Lordiphosa, SP exists largely as a single-copy gene with independent losses in several lineages. Within the Sophophora-Lordiphosa, the SP gene family has repeatedly and independently expanded. Up to seven copies, collectively displaying extensive sequence variation, are present in some species. Despite these changes, SP expression remains restricted to the male reproductive tract. Alongside, we document considerable interspecific variation in the presence and morphology of seminal microcarriers that, despite the critical role SP plays in microcarrier assembly in D. melanogaster, appear to be independent of changes in the presence/absence or sequence of SP. We end by providing evidence that SP's evolution is decoupled from that of its receptor, SPR, in which we detect no evidence of correlated diversifying selection. Collectively, our work describes the divergent evolutionary trajectories that a novel gene has taken following its origin and finds a surprisingly weak coevolutionary signal between a supposedly sexually antagonistic protein and its receptor.
Collapse
Affiliation(s)
- Ben R. Hopkins
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | - Aidan Angus-Henry
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | | | - Jolie A. Carlisle
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Ammon Thompson
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| |
Collapse
|
40
|
Goldberg EE, Lundgren EJ, Romero-Severson EO, Leitner T. Inferring viral transmission time from phylogenies for known transmission pairs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557404. [PMID: 37745490 PMCID: PMC10515827 DOI: 10.1101/2023.09.12.557404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
When the time of an HIV transmission event is unknown, methods to identify it from virus genetic data can reveal the circumstances that enable transmission. We developed a single-parameter Markov model to infer transmission time from an HIV phylogeny constructed of multiple virus sequences from people in a transmission pair. Our method finds the statistical support for transmission occurring in different possible time slices. We compared our time-slice model results to previously-described methods: a tree-based logical transmission interval, a simple parsimony-like rules-based method, and a more complex coalescent model. Across simulations with multiple transmitted lineages, different transmission times relative to the source's infection, and different sampling times relative to transmission, we found that overall our time-slice model provided accurate and narrower estimates of the time of transmission. We also identified situations when transmission time or direction was difficult to estimate by any method, particularly when transmission occurred long after the source was infected and when sampling occurred long after transmission. Applying our model to real HIV transmission pairs showed some agreement with facts known from the case investigations. We also found, however, that uncertainty on the inferred transmission time was driven more by uncertainty from time-calibration of the phylogeny than from the model inference itself. Encouragingly, comparable performance of the Markov time-slice model and the coalescent model-which make use of different information within a tree-suggests that a new method remains to be described that will make full use of the topology and node times for improved transmission time inference.
Collapse
Affiliation(s)
- Emma E. Goldberg
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos NM, USA
| | - Erik J. Lundgren
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos NM, USA
| | | | - Thomas Leitner
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos NM, USA
| |
Collapse
|
41
|
Ha AD, Aylward FO. Automated classification of giant virus genomes using a random forest model built on trademark protein families. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566645. [PMID: 38014039 PMCID: PMC10680617 DOI: 10.1101/2023.11.10.566645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Viruses of the phylum Nucleocytoviricota , often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG ( T axonomic Information of G iant viruses using T rademark O rthologous G roups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1,531 quality-checked, phylogenetically diverse Nucleocytoviricota genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% to the order level and 97.3% to the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% to the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.
Collapse
|
42
|
Jamali K, Käll L, Zhang R, Brown A, Kimanius D, Scheres SH. Automated model building and protein identification in cryo-EM maps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541002. [PMID: 37292681 PMCID: PMC10245678 DOI: 10.1101/2023.05.16.541002] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention. We present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality as those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy as humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will thus remove bottlenecks and increase objectivity in cryo-EM structure determination.
Collapse
Affiliation(s)
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Rui Zhang
- Washington University in St. Louis, St. Louis, MO, USA
| | - Alan Brown
- Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
43
|
Ashfaq F, Barkat MA, Ahmad T, Hassan MZ, Ahmad R, Barkat H, Idreesh Khan M, Saad Alhodieb F, Asiri YI, Siddiqui S. Phytocompound screening, antioxidant activity and molecular docking studies of pomegranate seed: a preventive approach for SARS-CoV-2 pathogenesis. Sci Rep 2023; 13:17069. [PMID: 37816760 PMCID: PMC10564957 DOI: 10.1038/s41598-023-43573-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/26/2023] [Indexed: 10/12/2023] Open
Abstract
A global hazard to public health has been generated by the coronavirus infection 2019 (COVID-19), which is spreading quickly. Pomegranate is a strong source of antioxidants and has demonstrated a number of pharmacological characteristics. This work was aimed to analyze the phytochemicals present in ethanolic pomegranate seed extract (PSE) and their in vitro antioxidant potential and further in-silico evaluation for antiviral potential against crystal structure of two nucleocapsid proteins i.e., N-terminal RNA binding domain (NRBD) and C-terminal Domain (CTD) of SARS-CoV-2. The bioactive components from ethanolic extract of PSE were assessed by gas chromatography-mass spectroscopy (GC-MS). Free radical scavenging activity of PSE was determined using DPPH dye. Molecular docking was executed through the Glide module of Maestro software. Lipinski's 5 rule was applied for drug-likeness characteristics using cheminformatics Molinspiration software while OSIRIS Data Warrior V5.5.0 was used to predict possible toxicological characteristics of components. Thirty-two phytocomponents was detected in PSE by GC-MS technique. Free radical scavenging assay revealed the high antioxidant capacity of PSE. Docking analysis showed that twenty phytocomponents from PSE exhibited good binding affinity (Docking score ≥ - 1.0 kcal/mol) towards NRBD and CTD nucleocapsid protein. This result increases the possibility that the top 20 hits could prevent the spread of SARS-CoV-2 by concentrating on both nucleocapsid proteins. Moreover, molecular dynamics (MD) simulation using GROMACS was used to check their binding efficacy and internal dynamics of top complexes with the lowest docking scores. The metrics root mean square deviation (RMSD), root mean square fluctuation (RMSF), intermolecular hydrogen bonding (H-bonds) and radius of gyration (Rg) revealed that the lead phytochemicals form an energetically stable complex with the target protein. Majority of the phytoconstituents exhibited drug-likeness with non-tumorigenic properties. Thus, the PSE phytoconstituents could be useful source of drug or nutraceutical development in SARS-CoV-2 pathogenesis.
Collapse
Affiliation(s)
- Fauzia Ashfaq
- Clinical Nutrition Department, Applied Medical Sciences College, Jazan University, Jazan 82817, Saudi Arabia
| | - Md Abul Barkat
- Department of Pharmaceutics, College of Pharmacy, University of Hafr Al-Batin, Al Jamiah, 39524, Hafr Al Batin, Saudi Arabia.
| | - Tanvir Ahmad
- Department of Biotechnology, Era's Lucknow Medical College and Hospital, Lucknow, 226003, India
| | - Mohd Zaheen Hassan
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Khalid University, Abha, Saudi Arabia
| | - Rumana Ahmad
- Department of Biochemistry, Era's Lucknow Medical College and Hospital, Lucknow, 226003, India
| | - Harshita Barkat
- Department of Pharmaceutics, College of Pharmacy, University of Hafr Al-Batin, Al Jamiah, 39524, Hafr Al Batin, Saudi Arabia
| | - Mohammad Idreesh Khan
- Department of Clinical Nutrition, College of Applied Health Sciences in Ar Rass, Qassim University, Ar Rass 51921, Saudi Arabia
| | - Fahad Saad Alhodieb
- Department of Clinical Nutrition, College of Applied Health Sciences in Ar Rass, Qassim University, Ar Rass 51921, Saudi Arabia
| | - Yahya I Asiri
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha, Saudi Arabia
| | - Sahabjada Siddiqui
- Department of Biotechnology, Era's Lucknow Medical College and Hospital, Lucknow, 226003, India.
| |
Collapse
|
44
|
Liu WW, Pan P, Zhou NY. The presence of benzene ring activating CoA ligases for aromatics degradation in the ANaerobic MEthanotrophic (ANME) archaea. Microbiol Spectr 2023; 11:e0176623. [PMID: 37754561 PMCID: PMC10581246 DOI: 10.1128/spectrum.01766-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 08/03/2023] [Indexed: 09/28/2023] Open
Abstract
Petroleum-source and black carbon-source aromatic compounds are present in the cold seep environments, where ANaerobic MEthanotrophic (ANME) archaea as the dominant microbial community mediates the anaerobic oxidation of methane to produce inorganic and organic carbon. Here, by predicting the aromatics catabolic pathways in ANME metagenome-assembled genomes, we provide genomic and biochemical evidences that ANME have the potential of metabolizing aromatics via the strategy of CoA activation of the benzene ring using phenylacetic acid and benzoate as the substrates. Two ring-activating enzymes phenylacetate-CoA ligase (PaaKANME) and benzoate-CoA ligase (BadAANME) are able to convert phenylacetate to phenylacetyl-CoA and benzoate to benzoyl-CoA in vitro, respectively. They are mesophilic, alkali resistance, and with broad substrate spectra showing different affinity with various substrates. An exploration of the relative gene abundance in ANME genomes and cold seep environments indicates that about 50% of ANME genomes contain PCL genes, and various bacteria and archaea contain PCL and BCL genes. The results provide evidences for the capability of heterotrophic metabolism of aromatic compounds by ANME. This has not only enhanced our understanding of the nutrient range of ANME but also helped to explore the additional ecological and biogeochemical significance of this ubiquitous sedimentary archaea in the carbon flow in the cold seep environments. IMPORTANCE ANaerobic MEthanotrophic (ANME) archaea is the dominant microbial community mediating the anaerobic oxidation of methane in the cold seep environments, where aromatic compounds are present. Then it is hypothesized that ANME may be involved in the metabolism of aromatics. Here, we provide genomic and biochemical evidences for the heterotrophic metabolism of aromatic compounds by ANME, enhancing our understanding of their nutrient range and also shedding light on the ecological and biogeochemical significance of these ubiquitous sedimentary archaea in carbon flow within cold seep environments. Overall, this study offers valuable insights into the metabolic capabilities of ANME and their potential contributions to the global carbon cycle.
Collapse
Affiliation(s)
- Wei-Wei Liu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, and School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Piaopiao Pan
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, and School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Ning-Yi Zhou
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, and School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
45
|
Li J, Huang Y, Hutton GJ, Aparasu RR. Assessing treatment switch among patients with multiple sclerosis: A machine learning approach. EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY 2023; 11:100307. [PMID: 37554927 PMCID: PMC10405092 DOI: 10.1016/j.rcsop.2023.100307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/08/2023] [Accepted: 07/09/2023] [Indexed: 08/10/2023] Open
Abstract
BACKGROUND Patients with multiple sclerosis (MS) frequently switch their Disease-Modifying Agents (DMA) for effectiveness and safety concerns. This study aimed to develop and compare the random forest (RF) machine learning (ML) model with the logistic regression (LR) model for predicting DMA switching among MS patients. METHODS This retrospective longitudinal study used the TriNetX data from a federated electronic medical records (EMR) network. Between September 2010 and May 2017, adults (aged ≥18) MS patients with ≥1 DMA prescription were identified, and the earliest DMA date was assigned as the index date. Patients prescribed any DMAs different from their index DMAs were considered as treatment switch. . The RF and LR models were built with 72 baseline characteristics and trained with 70% of the randomly split data after up-sampling. Area Under the Curves (AUC), accuracy, recall, G-measure, and F-1 score were used to evaluate the model performance. RESULTS In this study, 7258 MS patients with ≥1 DMA were identified. Within two years, 16% of MS patients switched to a different DMA. The RF model obtained significantly better discrimination than the LR model (AUC = 0.65 vs. 0.63, p < 0.0001); however, the RF model had a similar predictive performance to the LR model with respect to F- and G-measures (RF: 72% and 73% vs. LR: 72% and 73%, respectively). The most influential features identified from the RF model were age, type of index medication, and year of index. CONCLUSIONS Compared to the LR model, RF performed better in predicting DMA switch in MS patients based on AUC measures; however, judged by F- and G-measures, the RF model performed similarly to LR. Further research is needed to understand the role of ML techniques in predicting treatment outcomes for the decision-making process to achieve optimal treatment goals.
Collapse
Affiliation(s)
- Jieni Li
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, TX, USA
| | - Yinan Huang
- Department of Pharmacy Administration, College of Pharmacy, University of Mississippi, Oxford, MS, USA
| | | | - Rajender R Aparasu
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, TX, USA
| |
Collapse
|
46
|
Vera-Siguenza E, Escribano-Gonzalez C, Serrano-Gonzalo I, Eskla KL, Spill F, Tennant D. Mathematical reconstruction of the metabolic network in an in-vitro multiple myeloma model. PLoS Comput Biol 2023; 19:e1011374. [PMID: 37713666 PMCID: PMC10503963 DOI: 10.1371/journal.pcbi.1011374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 07/19/2023] [Indexed: 09/17/2023] Open
Abstract
It is increasingly apparent that cancer cells, in addition to remodelling their metabolism to survive and proliferate, adapt and manipulate the metabolism of other cells. This property may be a telling sign that pre-clinical tumour metabolism studies exclusively utilising in-vitro mono-culture models could prove to be limited for uncovering novel metabolic targets able to translate into clinical therapies. Although this is increasingly recognised, and work towards addressing the issue is becoming routinary much remains poorly understood. For instance, knowledge regarding the biochemical mechanisms through which cancer cells manipulate non-cancerous cell metabolism, and the subsequent impact on their survival and proliferation remains limited. Additionally, the variations in these processes across different cancer types and progression stages, and their implications for therapy, also remain largely unexplored. This study employs an interdisciplinary approach that leverages the predictive power of mathematical modelling to enrich experimental findings. We develop a functional multicellular in-silico model that facilitates the qualitative and quantitative analysis of the metabolic network spawned by an in-vitro co-culture model of bone marrow mesenchymal stem- and myeloma cell lines. To procure this model, we devised a bespoke human genome constraint-based reconstruction workflow that combines aspects from the legacy mCADRE & Metabotools algorithms, the novel redHuman algorithm, along with 13C-metabolic flux analysis. Our workflow transforms the latest human metabolic network matrix (Recon3D) into two cell-specific models coupled with a metabolic network spanning a shared growth medium. When cross-validating our in-silico model against the in-vitro model, we found that the in-silico model successfully reproduces vital metabolic behaviours of its in-vitro counterpart; results include cell growth predictions, respiration rates, as well as support for observations which suggest cross-shuttling of redox-active metabolites between cells.
Collapse
Affiliation(s)
- Elias Vera-Siguenza
- Institute of Metabolism and Systems Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- Watson School of Mathematics, University of Birmingham, Birmingham, United Kingdom
| | - Cristina Escribano-Gonzalez
- Institute of Metabolism and Systems Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Irene Serrano-Gonzalo
- Instituto de Investigación Sanitaria Aragón, Fundación Española para el Estudio y Terapéutica de la enfermedad de Gaucher y otras Lisosomales, Zaragoza, España
| | - Kattri-Liis Eskla
- Institute of Metabolism and Systems Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- Department of Physiology, Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Fabian Spill
- Watson School of Mathematics, University of Birmingham, Birmingham, United Kingdom
| | - Daniel Tennant
- Institute of Metabolism and Systems Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
47
|
Urayama SI, Fukudome A, Hirai M, Okumura T, Nishimura Y, Takaki Y, Kurosawa N, Koonin EV, Krupovic M, Nunoura T. Distinct groups of RNA viruses associated with thermoacidophilic bacteria. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.02.547447. [PMID: 37790367 PMCID: PMC10542131 DOI: 10.1101/2023.07.02.547447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Recent massive metatranscriptome mining substantially expanded the diversity of the bacterial RNA virome, suggesting that additional groups of riboviruses infecting bacterial hosts remain to be discovered. We employed full length double-stranded (ds) RNA sequencing for identification of riboviruses associated with microbial consortia dominated by bacteria and archaea in acidic hot springs in Japan. Whole sequences of two groups of multisegmented riboviruses genomes were obtained. One group, which we denoted hot spring riboviruses (HsRV), consists of unusual viruses with distinct RNA-dependent RNA polymerases (RdRPs) that seem to be intermediates between typical ribovirus RdRPs and viral reverse transcriptases. We also identified viruses encoding HsRV-like RdRPs in moderate aquatic environments, including marine water, river sediments and salt marsh, indicating that this previously overlooked ribovirus group is not restricted to the extreme ecosystem. The HsRV-like viruses are candidates for a distinct phylum or even kingdom within the viral realm Riboviria. The second group, denoted hot spring partiti-like viruses (HsPV), is a distinct branch within the family Partitiviridae. All genome segments in both these groups of viruses display the organization typical of bacterial riboviruses, where multiple open reading frames encoding individual proteins are preceded by ribosome-binding sites. Together with the identification in bacteria-dominated habitats, this genome architecture indicates that riboviruses of these distinct groups infect thermoacidophilic bacterial hosts.
Collapse
Affiliation(s)
- Syun-ichi Urayama
- Department of Life and Environmental Sciences, Laboratory of Fungal Interaction and Molecular Biology, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
- Microbiology Research Center for Sustainability (MiCS), University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Akihito Fukudome
- Howard Hughes Medical Institute, Department of Biology and Department of Molecular and Cellular Biochemistry, Indiana Univeristy, Bloomington, IN, USA
| | - Miho Hirai
- Super-cutting-edge Grand and Advanced Research (SUGAR) Program, Japan Agency for Marine Science and Technology (JAMSTEC), 2–15 Natsushima-cho, Yokosuka, Kanagawa 237–0061, Japan
| | - Tomoyo Okumura
- Marine Core Research Institute, Kochi University, 200 Otsu, Monobe, Nankoku City, Kochi, 783-8502, Japan
| | - Yosuke Nishimura
- Research Center for Bioscience and Nanoscience (CeBN), JAMSTEC, 2–15 Natsushima-cho, Yokosuka, Kanagawa 237–0061, Japan
| | - Yoshihiro Takaki
- Super-cutting-edge Grand and Advanced Research (SUGAR) Program, Japan Agency for Marine Science and Technology (JAMSTEC), 2–15 Natsushima-cho, Yokosuka, Kanagawa 237–0061, Japan
| | - Norio Kurosawa
- Department of Science and Engineering for Sustainable Innovation, Faculty of Science and Engineering, Soka University, Hachioji 192-8577, Japan
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Takuro Nunoura
- Research Center for Bioscience and Nanoscience (CeBN), JAMSTEC, 2–15 Natsushima-cho, Yokosuka, Kanagawa 237–0061, Japan
| |
Collapse
|
48
|
Knobel P, Hwang I, Castro E, Sheffield P, Holaday L, Shi L, Amini H, Schwartz J, Sade MY. Socioeconomic and racial disparities in source-apportioned PM 2.5 levels across urban areas in the contiguous US, 2010. ATMOSPHERIC ENVIRONMENT (OXFORD, ENGLAND : 1994) 2023; 303:119753. [PMID: 37215166 PMCID: PMC10194033 DOI: 10.1016/j.atmosenv.2023.119753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Fine particulate matter (PM2.5) air pollution exposure is associated with short and long-term health effects. Several studies found differences in PM2.5 exposure associated with neighborhood racial and socioeconomic composition. However, most focused on total PM2.5 mass rather than its chemical components and their sources. In this study, we describe the ZIP code characteristics that drive the disparities in exposure to PM2.5 chemical components attributed to source categories both nationally and regionally. We obtained annual mean predictions of PM2.5 and fourteen of its chemical components from spatiotemporal models and socioeconomic and racial predictor variables from the 2010 US Census, and the American Community Survey 5-year estimates. We used non-negative matrix factorization to attribute the chemical components to five source categories. We fit generalized nonlinear models to assess the associations between the neighborhood predictors and each PM2.5 source category in urban areas in the United States in 2010 (n=25,790 zip codes). We observed higher PM2.5 levels in ZIP codes with higher proportions of Black individuals and lower socioeconomic status. Racial exposure disparities were mainly attributed to Heavy Fuel, Oil and Industrial, Metal Processing Industry and Agricultural, and Motor Vehicle sources. Economic disparities were mainly attributed to Soil and Crustal Dust, Heavy Fuel Oil and Industrial, Metal Processing Industry and Agricultural, and Motor Vehicle sources. Upon further analysis through stratifying by regions within the United States, we found that the associations between ZIP code characteristics and source-attributed PM2.5 levels were generally greater in Western states. In conclusion, racial, socioeconomic, and geographic inequalities in exposure to PM2.5 and its components are driven by systematic differences in component sources that can inform air quality improvement strategies.
Collapse
Affiliation(s)
- Pablo Knobel
- Icahn School of Medicine at Mount Sinai, Department of Environmental Medicine and Public Health, New York, NY, USA
| | - Inhye Hwang
- Icahn School of Medicine at Mount Sinai, Department of Environmental Medicine and Public Health, New York, NY, USA
| | - Edgar Castro
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Perry Sheffield
- Icahn School of Medicine at Mount Sinai, Department of Environmental Medicine and Public Health, New York, NY, USA
| | - Louisa Holaday
- Division of General Internal Medicine, Department of Medicine, Mount Sinai School of Medicine, New York, New York, USA
| | - Liuhua Shi
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Heresh Amini
- Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Joel Schwartz
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Maayan Yitshak Sade
- Icahn School of Medicine at Mount Sinai, Department of Environmental Medicine and Public Health, New York, NY, USA
| |
Collapse
|
49
|
Bowles KR, Pugh DA, Pedicone C, Oja L, Weitzman SA, Liu Y, Chen JL, Disney MD, Goate AM. Development of MAPT S305 mutation models exhibiting elevated 4R tau expression, resulting in altered neuronal and astrocytic function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.02.543224. [PMID: 37333200 PMCID: PMC10274740 DOI: 10.1101/2023.06.02.543224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Due to the importance of 4R tau in the pathogenicity of primary tauopathies, it has been challenging to model these diseases in iPSC-derived neurons, which express very low levels of 4R tau. To address this problem we have developed a panel of isogenic iPSC lines carrying the MAPT splice-site mutations S305S, S305I or S305N, derived from four different donors. All three mutations significantly increased the proportion of 4R tau expression in iPSC-neurons and astrocytes, with up to 80% 4R transcripts in S305N neurons from as early as 4 weeks of differentiation. Transcriptomic and functional analyses of S305 mutant neurons revealed shared disruption in glutamate signaling and synaptic maturity, but divergent effects on mitochondrial bioenergetics. In iPSC-astrocytes, S305 mutations induced lysosomal disruption and inflammation and exacerbated internalization of exogenous tau that may be a precursor to the glial pathologies observed in many tauopathies. In conclusion, we present a novel panel of human iPSC lines that express unprecedented levels of 4R tau in neurons and astrocytes. These lines recapitulate previously characterized tauopathy-relevant phenotypes, but also highlight functional differences between the wild type 4R and mutant 4R proteins. We also highlight the functional importance of MAPT expression in astrocytes. These lines will be highly beneficial to tauopathy researchers enabling a more complete understanding of the pathogenic mechanisms underlying 4R tauopathies across different cell types.
Collapse
Affiliation(s)
- KR Bowles
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - DA Pugh
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - C Pedicone
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - L Oja
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - SA Weitzman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - Y Liu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| | - JL Chen
- Department of Chemistry, Scripps Research Institute, Jupiter, FL, United States of America
| | - MD Disney
- Department of Chemistry, Scripps Research Institute, Jupiter, FL, United States of America
| | - AM Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
- Ronald M. Loeb Center for Alzheimer’s disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America
| |
Collapse
|
50
|
Ansari M, White AD. Learning Peptide Properties with Positive Examples Only. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543289. [PMID: 37333233 PMCID: PMC10274696 DOI: 10.1101/2023.06.01.543289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| |
Collapse
|