1
|
Grant OA, Iacoangeli A, Zwamborn RAJ, van Rheenen W, Byrne R, Van Eijk KR, Kenna K, van Vugt JJFA, Cooper-Knock J, Kenna B, Vural A, Topp S, Campos Y, Weber M, Smith B, Dobson R, van Es MA, Vourc'h P, Corcia P, de Carvalho M, Gotkine M, Panades MP, Mora JS, Mill J, Garton F, McRae A, Wray NR, Shaw PJ, Landers JE, Glass JD, Shaw CE, Basak N, Hardiman O, Van Damme P, McLaughlin RL, van den Berg LH, Veldink JH, Al-Chalabi A, Al Khleifat A. Sex-specific DNA methylation differences in Amyotrophic lateral sclerosis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.22.624866. [PMID: 39651197 PMCID: PMC11623544 DOI: 10.1101/2024.11.22.624866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Sex is an important covariate in all genetic and epigenetic research due to its role in the incidence, progression and outcome of many phenotypic characteristics and human diseases. Amyotrophic lateral sclerosis (ALS) is a motor neuron disease with a sex bias towards higher incidence in males. Here, we report for the first time a blood-based epigenome-wide association study meta-analysis in 9274 individuals after stringent quality control (5529 males and 3975 females). We identified a total of 226 ALS saDMPs (sex-associated DMPs) annotated to a total of 159 unique genes. These ALS saDMPs were depleted at transposable elements yet significantly enriched at enhancers and slightly enriched at 3'UTRs. These ALS saDMPs were enriched for transcription factor motifs such as ESR1 and REST. Moreover, we identified an additional 10 genes associated with ALS saDMPs through chromatin loop interactions, suggesting a potential regulatory role for these saDMPs on distant genes. Furthermore, we investigated the relationship between DNA methylation at specific CpG sites and overall survival in ALS using Cox proportional hazards models. We identified two ALS saDMPs, cg14380013 and cg06729676, that showed significant associations with survival. Overall, our study reports a reliable catalogue of sex-associated ALS saDMPs in ALS and elucidates several characteristics of these sites using a large-scale dataset. This resource will benefit future studies aiming to investigate the role of sex in the incidence, progression and risk for ALS.
Collapse
|
2
|
Czech E, Millar TR, Tyler W, White T, Jeffery B, Miles A, Tallman S, Wojdyla R, Zabad S, Hammerbacher J, Kelleher J. Analysis-ready VCF at Biobank scale using Zarr. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.11.598241. [PMID: 38915693 PMCID: PMC11195102 DOI: 10.1101/2024.06.11.598241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Background Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of the VCF data model (either as text or packed binary) emphasises efficient retrieval of all data for a given variant, but accessing data on a field or sample basis is inefficient. Biobank scale datasets currently available consist of hundreds of thousands of whole genomes and hundreds of terabytes of compressed VCF. Row-wise data storage is fundamentally unsuitable and a more scalable approach is needed. Results We present the VCF Zarr specification, an encoding of the VCF data model using Zarr which makes retrieving subsets of the data much more efficient. Zarr is a cloud-native format for storing multi-dimensional data, widely used in scientific computing. We show how this format is far more efficient than standard VCF based approaches, and competitive with specialised methods for storing genotype data in terms of compression ratios and calculation performance. We demonstrate the VCF Zarr format (and the vcf2zarr conversion utility) on a subset of the Genomics England aggV2 dataset comprising 78,195 samples and 59,880,903 variants, with a 5X reduction in storage and greater than 300X reduction in CPU usage in some representative benchmarks. Conclusions Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely-used, open-source technologies has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores, while maintaining compatibility with existing file-oriented workflows.
Collapse
Affiliation(s)
| | - Timothy R. Millar
- The New Zealand Institute for Plant & Food Research Ltd, Lincoln, New Zealand
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Will Tyler
- Independent researcher, University of Oxford, UK
| | - Tom White
- Tom White Consulting Ltd., University of Oxford, UK
| | - Ben Jeffery
- Big Data Institute, University of Oxford, UK
| | - Alistair Miles
- Wellcome Sanger Institute, McGill University, Montreal, QC, Canada
| | - Sam Tallman
- Genomics England, McGill University, Montreal, QC, Canada
| | | | - Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | | | | |
Collapse
|
3
|
Firsanov D, Zacher M, Tian X, Sformo TL, Zhao Y, Tombline G, Lu JY, Zheng Z, Perelli L, Gurreri E, Zhang L, Guo J, Korotkov A, Volobaev V, Biashad SA, Zhang Z, Heid J, Maslov A, Sun S, Wu Z, Gigas J, Hillpot E, Martinez J, Lee M, Williams A, Gilman A, Hamilton N, Haseljic E, Patel A, Straight M, Miller N, Ablaeva J, Tam LM, Couderc C, Hoopman M, Moritz R, Fujii S, Hayman DJ, Liu H, Cai Y, Leung AKL, Simons MJP, Zhang Z, Nelson CB, Abegglen LM, Schiffman JD, Gladyshev VN, Modesti M, Genovese G, Vijg J, Seluanov A, Gorbunova V. DNA repair and anti-cancer mechanisms in the long-lived bowhead whale. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.07.539748. [PMID: 39574710 PMCID: PMC11580846 DOI: 10.1101/2023.05.07.539748] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
At over 200 years, the maximum lifespan of the bowhead whale exceeds that of all other mammals. The bowhead is also the second-largest animal on Earth, reaching over 80,000 kg1. Despite its very large number of cells and long lifespan, the bowhead is not highly cancer-prone, an incongruity termed Peto's Paradox2. This phenomenon has been explained by the evolution of additional tumor suppressor genes in other larger animals, supported by research on elephants demonstrating expansion of the p53 gene3-5. Here we show that bowhead whale fibroblasts undergo oncogenic transformation after disruption of fewer tumor suppressors than required for human fibroblasts. However, analysis of DNA repair revealed that bowhead cells repair double strand breaks (DSBs) and mismatches with uniquely high efficiency and accuracy compared to other mammals. The protein CIRBP, implicated in protection from genotoxic stress, was present in very high abundance in the bowhead whale relative to other mammals. We show that CIRBP and its downstream protein RPA2, also present at high levels in bowhead cells, increase the efficiency and fidelity of DNA repair in human cells. These results indicate that rather than possessing additional tumor suppressor genes as barriers to oncogenesis, the bowhead whale relies on more accurate and efficient DNA repair to preserve genome integrity. This strategy which does not eliminate damaged cells but repairs them may be critical for the long and cancer-free lifespan of the bowhead whale.
Collapse
Affiliation(s)
- Denis Firsanov
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Max Zacher
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Xiao Tian
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Todd L. Sformo
- Department of Wildlife Management, North Slope Borough, Utqiaġvik (Barrow), AK 99723, USA
| | - Yang Zhao
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Greg Tombline
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - J. Yuyang Lu
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Zhizhong Zheng
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Luigi Perelli
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Enrico Gurreri
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Li Zhang
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jing Guo
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Anatoly Korotkov
- Department of Biology, University of Rochester, Rochester, NY, USA
| | | | | | - Zhihui Zhang
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Johanna Heid
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Alex Maslov
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Shixiang Sun
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Zhuoer Wu
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Jonathan Gigas
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Eric Hillpot
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - John Martinez
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Minseon Lee
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Alyssa Williams
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Abbey Gilman
- Department of Biology, University of Rochester, Rochester, NY, USA
| | | | - Ena Haseljic
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Avnee Patel
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Maggie Straight
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Nalani Miller
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Julia Ablaeva
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Lok Ming Tam
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Chloé Couderc
- Department of Biology, University of Rochester, Rochester, NY, USA
| | | | | | - Shingo Fujii
- Cancer Research Center of Marseille, Department of Genome Integrity, CNRS UMR7258, Inserm U1068, Institut Paoli-Calmettes, Aix Marseille Univ, Marseille, France
| | | | - Hongrui Liu
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- Cross-Disciplinary Graduate Program in Biomedical Sciences, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Yuxuan Cai
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Anthony K. L. Leung
- Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - C. Bradley Nelson
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Lisa M. Abegglen
- Department of Pediatrics & Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
- Peel Therapeutics, Inc., Salt Lake City, UT, USA
| | - Joshua D. Schiffman
- Department of Pediatrics & Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
- Peel Therapeutics, Inc., Salt Lake City, UT, USA
| | - Vadim N. Gladyshev
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Mauro Modesti
- Cancer Research Center of Marseille, Department of Genome Integrity, CNRS UMR7258, Inserm U1068, Institut Paoli-Calmettes, Aix Marseille Univ, Marseille, France
| | - Giannicola Genovese
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andrei Seluanov
- Department of Biology, University of Rochester, Rochester, NY, USA
- Department of Medicine, University of Rochester Medical Center, Rochester, NY, USA
| | - Vera Gorbunova
- Department of Biology, University of Rochester, Rochester, NY, USA
- Department of Medicine, University of Rochester Medical Center, Rochester, NY, USA
| |
Collapse
|
4
|
Chuah J, Cordi C, Hahn J, Hurley J. Dual-Approach Co-expression Analysis Framework (D-CAF) Enables Identification of Novel Circadian Regulation From Multi-Omic Timeseries Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.10.617622. [PMID: 39463955 PMCID: PMC11507783 DOI: 10.1101/2024.10.10.617622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The circadian clock is a central driver of many biological and behavioral processes, regulating the levels of many genes and proteins, termed clock controlled genes and proteins (CCGs/CCPs), to impart biological timing at the molecular level. While transcriptomic and proteomic data has been analyzed to find potential CCGs and CCPs, multi-omic modeling of circadian data, which has the potential to enhance the understanding of circadian control of biological timing, remains relatively rare due to several methodological hurdles. To address this gap, a Dual-approach Co-expression Analysis Framework (D-CAF) was created to perform perturbation-robust co-expression analysis on time-series measurements of both transcripts and proteins. Applying this D-CAF framework to previously gathered transcriptomic and proteomic data from mouse macrophages gathered over circadian time, we identified small, highly significant clusters of oscillating transcripts and proteins in the unweighted similarity matrices and larger, less significant clusters of of oscillating transcripts and proteins using the weighted similarity network. Functional enrichment analysis of these clusters identified novel immunological response pathways that appear to be under circadian control. Overall, our findings suggest that D-CAF is a tool that can be used by the circadian community to integrate multi-omic circadian data to improve our understanding of the mechanisms of circadian regulation of molecular processes.
Collapse
Affiliation(s)
- Joshua Chuah
- Department of Electrical, Computer, and Biomedical Engineering, Union College, 807 Union St, 12308, NY, USA,
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 8th St, 12180, NY, USA,
| | - Carmalena Cordi
- Department of Biological Sciences, RensselaerPolytechnic Institute, 110 8th St, 12180, NY, USA
| | - Juergen Hahn
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 8th St, 12180, NY, USA,
- Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, 110 8th St, 12180, NY, USA
| | - Jennifer Hurley
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 8th St, 12180, NY, USA,
- Department of Biological Sciences, RensselaerPolytechnic Institute, 110 8th St, 12180, NY, USA
| |
Collapse
|
5
|
Jores T, Mueth NA, Tonnies J, Char SN, Liu B, Grillo-Alvarado V, Abbitt S, Anand A, Deschamps S, Diehn S, Gordon-Kamm B, Jiao S, Munkvold K, Snowgren H, Sardesai N, Fields S, Yang B, Cuperus JT, Queitsch C. Small DNA elements that act as both insulators and silencers in plants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612883. [PMID: 39345455 PMCID: PMC11429706 DOI: 10.1101/2024.09.13.612883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Insulators are cis-regulatory elements that separate transcriptional units, whereas silencers are elements that repress transcription regardless of their position. In plants, these elements remain largely uncharacterized. Here, we use the massively parallel reporter assay Plant STARR-seq with short fragments of eight large insulators to identify more than 100 fragments that block enhancer activity. The short fragments can be combined to generate more powerful insulators that abolish the capacity of the strong viral 35S enhancer to activate the 35S minimal promoter. Unexpectedly, when tested upstream of weak enhancers, these fragments act as silencers and repress transcription. Thus, these elements are capable of both insulating or repressing transcription dependent upon regulatory context. We validate our findings in stable transgenic Arabidopsis, maize, and rice plants. The short elements identified here should be useful building blocks for plant biotechnology efforts.
Collapse
Affiliation(s)
- Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Synthetic Biology, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
- CEPLAS – Cluster of Excellence on Plant Sciences, Düsseldorf, Germany
| | - Nicholas A. Mueth
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jackson Tonnies
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Graduate Program in Biology, University of Washington, Seattle, WA, USA
| | - Si Nian Char
- Division of Plant Science and Technology, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Bo Liu
- Division of Plant Science and Technology, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Valentina Grillo-Alvarado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular & Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | | | - Ajith Anand
- Corteva Agriscience, Johnston, IA, USA
- Present address: MyFloraDNA, Sacramento, CA, USA
| | | | | | | | | | - Kathy Munkvold
- Corteva Agriscience, Johnston, IA, USA
- Present address: Foundation for Food & Agriculture Research, Washington, DC, USA
| | | | | | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Bing Yang
- Division of Plant Science and Technology, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Donald Danforth Plant Science Center, St. Louis, MO, USA
| | - Josh T. Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
6
|
Rahman JF, Hoque H, Jubayer AA, Jewel NA, Hasan MN, Chowdhury AT, Prodhan SH. Alfin-like (AL) transcription factor family in Oryza sativa L.: Genome-wide analysis and expression profiling under different stresses. BIOTECHNOLOGY REPORTS (AMSTERDAM, NETHERLANDS) 2024; 43:e00845. [PMID: 38962072 PMCID: PMC11217604 DOI: 10.1016/j.btre.2024.e00845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/24/2024] [Accepted: 05/29/2024] [Indexed: 07/05/2024]
Abstract
Oryza sativa L. is the world's most essential and economically important food crop. Climate change and ecological imbalances make rice plants vulnerable to abiotic and biotic stresses, threatening global food security. The Alfin-like (AL) transcription factor family plays a crucial role in plant development and stress responses. This study comprehensively analyzed this gene family and their expression profiles in rice, revealing nine AL genes, classifying them into three distinct groups based on phylogenetic analysis and identifying four segmental duplication events. RNA-seq data analysis revealed high expression levels of OsALs in different tissues, growth stages, and their responsiveness to stresses. RT-qPCR data showed significant expression of OsALs in different abiotic stresses. Identification of potential cis-regulatory elements in promoter regions has also unveiled their involvement. Tertiary structures of the proteins were predicted. These findings would lay the groundwork for future research to reveal their molecular mechanism in stress tolerance and plant development.
Collapse
Affiliation(s)
- Jeba Faizah Rahman
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Hammadul Hoque
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Abdullah -Al- Jubayer
- Department of Biotechnology and Genetic Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, 8100, Bangladesh
| | - Nurnabi Azad Jewel
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Md. Nazmul Hasan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Aniqua Tasnim Chowdhury
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| | - Shamsul H. Prodhan
- Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet, 3114, Bangladesh
| |
Collapse
|
7
|
Seo D, Lee CM, Apio C, Heo G, Timsina J, Kohlfeld P, Boada M, Orellana A, Fernandez MV, Ruiz A, Morris JC, Schindler SE, Park T, Cruchaga C, Sung YJ. Sex and aging signatures of proteomics in human cerebrospinal fluid identify distinct clusters linked to neurodegeneration. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.18.24309102. [PMID: 38947020 PMCID: PMC11213043 DOI: 10.1101/2024.06.18.24309102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Sex and age are major risk factors for chronic diseases. Recent studies examining age-related molecular changes in plasma provided insights into age-related disease biology. Cerebrospinal fluid (CSF) proteomics can provide additional insights into brain aging and neurodegeneration. By comprehensively examining 7,006 aptamers targeting 6,139 proteins in CSF obtained from 660 healthy individuals aged from 43 to 91 years old, we subsequently identified significant sex and aging effects on 5,097 aptamers in CSF. Many of these effects on CSF proteins had different magnitude or even opposite direction as those on plasma proteins, indicating distinctive CSF-specific signatures. Network analysis of these CSF proteins revealed not only modules associated with healthy aging but also modules showing sex differences. Through subsequent analyses, several modules were highlighted for their proteins implicated in specific diseases. Module 2 and 6 were enriched for many aging diseases including those in the circulatory systems, immune mechanisms, and neurodegeneration. Together, our findings fill a gap of current aging research and provide mechanistic understanding of proteomic changes in CSF during a healthy lifespan and insights for brain aging and diseases.
Collapse
|
8
|
Yadav NK, Saraswat M. A new feature selection approach with binary exponential henry gas solubility optimization and hybrid data transformation methods. MethodsX 2024; 12:102770. [PMID: 39677828 PMCID: PMC11639705 DOI: 10.1016/j.mex.2024.102770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/19/2024] [Indexed: 12/17/2024] Open
Abstract
In the common classification practices, feature selection is an important aspect that highly impacts the computation efficacy of the model, while implementing complex computer vision tasks. The metaheuristic optimization algorithms gain popularity to obtain optimal feature subset. However, the feature selection using metaheuristics suffers from two common stability problems, namely premature convergence and slow convergence rate. Therefore, to handle the stability problems, this paper presents a fused dataset transformation approach by joining weighted Principal Component Analysis and Fast Independent Component Analysis Techniques. The presented method solves the stability issues by first transforming the original dataset, thereafter newly proposed variant of Henry Gas Solubility Optimization is employed for obtaining a new feature's subset. The proposed method has been compared with other metaheuristic approaches across seven benchmark datasets and observed that it selects better features set which improves the accuracy and computational complexity of the model.
Collapse
Affiliation(s)
- Nand Kishor Yadav
- Jaypee Institute of Information Technology Noida, Uttar Pradesh, India
| | - Mukesh Saraswat
- Jaypee Institute of Information Technology Noida, Uttar Pradesh, India
| |
Collapse
|
9
|
Plubell DL, Huang E, Spencer SE, Poston K, Montine TJ, MacCoss MJ. Data Independent Acquisition to Inform the Development of Targeted Proteomics Assays Using a Triple Quadrupole Mass Spectrometer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.29.596554. [PMID: 38853953 PMCID: PMC11160738 DOI: 10.1101/2024.05.29.596554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Mass spectrometry based targeted proteomics methods provide sensitive and high-throughput analysis of selected proteins. To develop a targeted bottom-up proteomics assay, peptides must be evaluated as proxies for the measurement of a protein or proteoform in a biological matrix. Candidate peptide selection typically relies on predetermined biochemical properties, data from semi-stochastic sampling, or by empirical measurements. These strategies require extensive testing and method refinement due to the difficulties associated with prediction of peptide response in the biological matrix of interest. Gas-phase fractionated (GPF) narrow window data-independent acquisition (DIA) aids in the development of reproducible selected reaction monitoring (SRM) assays by providing matrix-specific information on peptide detectability and quantification by mass spectrometry. To demonstrate the suitability of DIA data for selecting peptide targets, we reimplement a portion of an existing assay to measure 98 Alzheimer's disease proteins in cerebrospinal fluid (CSF). Peptides were selected from GPF-DIA based on signal intensity and reproducibility. The resulting SRM assay exhibits similar quantitative precision to published data, despite the inclusion of different peptides between the assays. This workflow enables development of new assays without additional up-front data acquisition, demonstrated here through generation of a separate assay for an unrelated set of proteins in CSF from the same dataset.
Collapse
Affiliation(s)
- Deanna L Plubell
- University of Washington, Department of Genome Sciences, Seattle, WA, 98195, USA
| | - Eric Huang
- University of Washington, Department of Genome Sciences, Seattle, WA, 98195, USA
| | - Sandra E Spencer
- University of Washington, Department of Genome Sciences, Seattle, WA, 98195, USA
| | - Kathleen Poston
- Stanford University, Department of Neurology & Neurological Sciences, Stanford, CA, 94305, USA
| | - Thomas J Montine
- Stanford University, Department of Pathology, Stanford, CA, 94305, USA
| | - Michael J MacCoss
- University of Washington, Department of Genome Sciences, Seattle, WA, 98195, USA
| |
Collapse
|
10
|
Gong G, Jia H, Tang Y, Pei H, Zhai L, Huang J. Genetic analysis and QTL mapping for pericarp thickness in maize (Zea mays L.). BMC PLANT BIOLOGY 2024; 24:338. [PMID: 38664642 PMCID: PMC11044598 DOI: 10.1186/s12870-024-05052-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 04/19/2024] [Indexed: 04/29/2024]
Abstract
Proper pericarp thickness protects the maize kernel against pests and diseases, moreover, thinner pericarp improves the eating quality in fresh corn. In this study, we aimed to investigate the dynamic changes in maize pericarp during kernel development and identified the major quantitative trait loci (QTLs) for maize pericarp thickness. It was observed that maize pericarp thickness first increased and then decreased. During the growth and formation stages, the pericarp thickness gradually increased and reached the maximum, after which it gradually decreased and reached the minimum during maturity. To identify the QTLs for pericarp thickness, a BC4F4 population was constructed using maize inbred lines B73 (recurrent parent with thick pericarp) and Baimaya (donor parent with thin pericarp). In addition, a high-density genetic map was constructed using maize 10 K SNP microarray. A total of 17 QTLs related to pericarp thickness were identified in combination with the phenotypic data. The results revealed that the heritability of the thickness of upper germinal side of pericarp (UG) was 0.63. The major QTL controlling UG was qPT1-1, which was located on chromosome 1 (212,215,145-212,948,882). The heritability of the thickness of upper abgerminal side of pericarp (UA) was 0.70. The major QTL controlling UA was qPT2-1, which was located on chromosome 2 (2,550,197-14,732,993). In addition, a combination of functional annotation, DNA sequencing analysis and quantitative real-time PCR (qPCR) screened two candidate genes, Zm00001d001964 and Zm00001d002283, that could potentially control maize pericarp thickness. This study provides valuable insights into the improvement of maize pericarp thickness during breeding.
Collapse
Affiliation(s)
- Guantong Gong
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou, 510642, China
| | - Haitao Jia
- Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan, 430064, China
| | - Yunqi Tang
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou, 510642, China
| | - Hu Pei
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou, 510642, China
| | - Lihong Zhai
- Basic School of Medicine, Hubei University of Arts and Science, Xiangyang, 441053, China.
| | - Jun Huang
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
11
|
Mulder RH, Neumann A, Felix JF, Suderman M, Cecil CAM. What makes clocks tick? Characterizing developmental dynamics of adult epigenetic clock sites. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584597. [PMID: 38559237 PMCID: PMC10979995 DOI: 10.1101/2024.03.12.584597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
DNA methylation (DNAm) at specific sites can be used to calculate 'epigenetic clocks', which in adulthood are used as indicators of age(ing). However, little is known about how these clock sites 'behave' during development and what factors influence their variability in early life. This knowledge could be used to optimize healthy aging well before the onset of age-related conditions. Here, we leveraged results from two longitudinal population-based cohorts (N=5,019 samples from 2,348 individuals) to characterize trajectories of adult clock sites from birth to early adulthood. We find that clock sites (i) diverge widely in their developmental trajectories, often showing non-linear change over time; (ii) are substantially more likely than non-clock sites to vary between individuals already from birth, differences that are predictive of DNAm variation at later ages; and (iii) show enrichment for genetic and prenatal environmental exposures, supporting an early-origins perspective to epigenetic aging.
Collapse
Affiliation(s)
- Rosa H. Mulder
- Department of Child and Adolescent Psychiatry / Psychology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- The Generation R Study Group, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Alexander Neumann
- Department of Child and Adolescent Psychiatry / Psychology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- The Generation R Study Group, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Janine F. Felix
- The Generation R Study Group, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Pediatrics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Matthew Suderman
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Charlotte A. M. Cecil
- Department of Child and Adolescent Psychiatry / Psychology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
12
|
Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med 2024; 22:140. [PMID: 38321494 PMCID: PMC10848444 DOI: 10.1186/s12967-024-04940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024] Open
Abstract
Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expression of multiple genes. Existing SSP methods often rely on Top Scoring Pairs (TSP), which are platform-independent and easy to interpret through the concept of "relative expression reversals". Nevertheless, TSP methods face limitations in classifying complex patterns involving comparisons of more than two gene expressions. To overcome these constraints, we introduce a novel approach that extends TSP rules by constructing rank-based trees capable of encompassing extensive gene-gene comparisons. This method is bolstered by incorporating two ensemble strategies, boosting and random forest, to mitigate the risk of overfitting. Our implementation of ensemble rank-based trees employs boosting with LogitBoost cost and random forests, addressing both binary and multi-class classification problems. In a comparative analysis across 12 cancer gene expression datasets, our proposed methods demonstrate superior performance over both the k-TSP classifier and nearest template prediction methods. We have further refined our approach to facilitate variable selection and the generation of clear, precise decision rules from rank-based trees, enhancing interpretability. The cumulative evidence from our research underscores the significant potential of ensemble rank-based trees in advancing disease classification via gene expression data, offering a robust, interpretable, and scalable solution. Our software is available at https://CRAN.R-project.org/package=ranktreeEnsemble .
Collapse
Affiliation(s)
- Min Lu
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
| | - Ruijie Yin
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA
| | - X Steven Chen
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, 1475 NW 12th Ave, Miami, FL, 33136, USA.
| |
Collapse
|
13
|
Gao S, Liu XY, Ni R, Fu J, Tan H, Cheng AX, Lou HX. Molecular cloning and functional analysis of 4-coumarate: CoA ligases from Marchantia paleacea and their roles in lignin and flavanone biosynthesis. PLoS One 2024; 19:e0296079. [PMID: 38190396 PMCID: PMC10773943 DOI: 10.1371/journal.pone.0296079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 12/05/2023] [Indexed: 01/10/2024] Open
Abstract
Phenylpropanoids play important roles in plant physiology and the enzyme 4-coumarate: coenzyme A ligase (4CL) catalyzes the formation of thioesters. Despite extensive characterization in various plants, the functions of 4CLs in the liverwort Marchantia paleacea remain unknown. Here, four 4CLs from M. paleacea were isolated and functionally analyzed. Heterologous expression in Escherichia coli indicated the presence of different enzymatic activities in the four enzymes. Mp4CL1 and Mp4CL2 were able to convert caffeic, p-coumaric, cinnamic, ferulic, dihydro-p-coumaric, and 5-hydroxyferulic acids to their corresponding CoA esters, while Mp4CL3 and Mp4CL4 catalyzed none. Mp4CL1 transcription was induced when M. paleacea thalli were treated with methyl jasmonate (MeJA). The overexpression of Mp4CL1 increased the levels of lignin in transgenic Arabidopsis. In addition, we reconstructed the flavanone biosynthetic pathway in E. coli. The pathway comprised Mp4CL1, co-expressed with chalcone synthase (CHS) from different plant species, and the efficiency of biosynthesis was optimal when both the 4CL and CHS were obtained from the same species M. paleacea.
Collapse
Affiliation(s)
- Shuai Gao
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Xin-Yan Liu
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Rong Ni
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Jie Fu
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Hui Tan
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Ai-Xia Cheng
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
| | - Hong-Xiang Lou
- Key Laboratory of Chemical Biology of Natural Products, Ministry of Education, School of Pharmaceutical Sciences, Shandong University, Jinan, Shandong, China
- Shandong Provincial Clinical Research Center for Emergency and Critical Care Medicine, Jinan, Shan-dong, China
| |
Collapse
|
14
|
Kappagoda CN, Senevirathne R, Jayasundara D, Warnasekara Y, Srimantha L, De Silva L, Agampodi SB. The human Toll-like receptor 2 (TLR2) response during pathogenic Leptospira infection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.16.567338. [PMID: 38014008 PMCID: PMC10680769 DOI: 10.1101/2023.11.16.567338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Background Human innate immune responses are triggered through the interaction of human pattern recognition receptors and pathogen-associated molecular patterns. The role of toll-like receptor2 (TLR2) in mice innate immune response to leptospirosis is well established, while human studies are limited. The present study aimed to determine the TLR2 response among confirmed cases of leptospirosis. Methodology/Principle findings The study has two components. Clinically suspected patients of leptospirosis were confirmed using a previously validated qPCR assay. Total RNA was extracted from patients' RNA-stabilized whole blood samples. Human TLR2 gene expression (RT-qPCR) analysis was carried out using an exon-exon spanning primer pair, using CFX Maestro™ software. The first set of patient samples was used to calculate the Relative Normalized Expression (ΔΔCq value) of the TLR2 gene in comparison to a healthy control sample and normalized by the reference gene GAPDH (Glyceraldehyde-3-phosphate dehydrogenase). Secondly, recruited patient samples were subjected to TLR2 gene expression analysis and compared to healthy controls and normalized by the reference genes Beta-2-microglobulin(B2M), Hypoxanthine phosphoribosyltransferase 1 (HPRT 1).In the initial cohort of 64 confirmed leptospirosis cases, 18 were selected for human TLR2 gene expression analysis based on criteria of leptospiremia and RNA yield. Within this group, one individual exhibited a down-regulation of TLR2 gene (Expression/ΔΔCq=0.01352), whereas the remaining subjects presented no significant change in gene expression. In a subsequent cohort of 23 confirmed cases, 13 were chosen for similar analysis. Among these, three patients demonstrated down-regulation of TLR2 gene expression, with Expression/ΔΔCq values of 0.86574, 0.47200, and 0.28579, respectively. No TLR2 gene expression was noted in the other patients within this second group. Conclusions Our investigation into the acute phase of leptospirosis using human clinical samples has revealed a downregulation of TLR2 gene expression. This observation contrasts to the upregulation commonly reported in the majority of in-vitro and in-vivo studies of Leptospira infection. These preliminary findings prompt a need for further research to explore the mechanisms underlying TLR2's role in the pathogenesis of leptospirosis, which may differ in clinical settings compared to laboratory models. Author Summary The human immune system employs pattern recognition receptors like toll-like receptor 2 (TLR2) to detect and combat infections such as leptospirosis. While TLR2's role is well-documented in mice, its function in the human response to leptospirosis remains unclear. Our study evaluated TLR2 activity in patients with confirmed leptospirosis. We conducted a genetic analysis of blood samples from these patients, comparing TLR2 gene activity against healthy individuals, with standard reference genes for accuracy. Contrary to expectations and existing laboratory data, we observed a decrease in TLR2 activity in some patients. This suggests that human TLR2 responses in actual infections may diverge from established laboratory models. These findings indicate a need for further study to understand the human immune response to leptospirosis, which may significantly differ from that observed in controlled experimental settings.
Collapse
|
15
|
Oostrom M, Muniak MA, Eichler West RM, Akers S, Pande P, Obiri M, Wang W, Bowyer K, Wu Z, Bramer LM, Mao T, Webb-Robertson BJ. Fine-tuning TrailMap: The utility of transfer learning to improve the performance of deep learning in axon segmentation of light-sheet microscopy images. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.23.563546. [PMID: 37961439 PMCID: PMC10634742 DOI: 10.1101/2023.10.23.563546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Light-sheet microscopy has made possible the 3D imaging of both fixed and live biological tissue, with samples as large as the entire mouse brain. However, segmentation and quantification of that data remains a time-consuming manual undertaking. Machine learning methods promise the possibility of automating this process. This study seeks to advance the performance of prior models through optimizing transfer learning. We fine-tuned the existing TrailMap model using expert-labeled data from noradrenergic axonal structures in the mouse brain. By fine-tuning the final two layers of the neural network at a lower learning rate of the TrailMap model, we demonstrate an improved recall and an occasionally improved adjusted F1-score within our test dataset over using the originally trained TrailMap model.
Collapse
Affiliation(s)
- Marjolein Oostrom
- AI & Data Analytics Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Michael A. Muniak
- Vollum Institute, Oregon Health & Science University, Portland, OR USA
| | | | - Sarah Akers
- AI & Data Analytics Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Paritosh Pande
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Moses Obiri
- AI & Data Analytics Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Wei Wang
- Appel Alzheimer’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY USA
| | - Kasey Bowyer
- Appel Alzheimer’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY USA
| | - Zhuhao Wu
- Appel Alzheimer’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY USA
| | - Lisa M. Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Tianyi Mao
- Vollum Institute, Oregon Health & Science University, Portland, OR USA
| | | |
Collapse
|
16
|
Nishida AH, Ochman H. Origins and Evolution of Novel Bacteroides in Captive Apes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.20.563286. [PMID: 37961372 PMCID: PMC10634691 DOI: 10.1101/2023.10.20.563286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Bacterial strains evolve in response to the gut environment of their hosts, with genomic changes that influence their interactions with hosts as well as with other members of the gut community. Great apes in captivity have acquired strains of Bacteroides xylanisolvens, which are common within gut microbiome of humans but not typically found other apes, thereby enabling characterization of strain evolution following colonization. Here, we isolate, sequence and reconstruct the history of gene gain and loss events in numerous captive-ape-associated strains since their divergence from their closest human-associated strains. We show that multiple captive-ape-associated B. xylanisolvens lineages have independently acquired gene complexes that encode functions related to host mucin metabolism. Our results support the finding of high genome fluidity in Bacteroides, in that several strains, in moving from humans to captive apes, have rapidly gained large genomic regions that augment metabolic properties not previously present in their relatives.
Collapse
Affiliation(s)
- Alexandra H. Nishida
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712 USA
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712 USA
| |
Collapse
|
17
|
Venkataraghavan S, Pankow JS, Boerwinkle E, Fornage M, Selvin E, Ray D. Epigenome-wide association study of incident type 2 diabetes in Black and White participants from the Atherosclerosis Risk in Communities Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.09.23293896. [PMID: 37609313 PMCID: PMC10441493 DOI: 10.1101/2023.08.09.23293896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
DNA methylation studies of incident type 2 diabetes in US populations are limited, and to our knowledge none included individuals of African descent living in the US. We performed an epigenome-wide association analysis of blood-based methylation levels at CpG sites with incident type 2 diabetes using Cox regression in 2,091 Black and 1,029 White individuals from the Atherosclerosis Risk in Communities study. At an epigenome-wide significance threshold of 10-7, we detected 7 novel diabetes-associated CpG sites in C1orf151 (cg05380846: HR= 0.89, p = 8.4 × 10-12), ZNF2 (cg01585592: HR= 0.88, p = 1.6 × 10-9), JPH3 (cg16696007: HR= 0.87, p = 7.8 × 10-9), GPX6 (cg02793507: HR= 0.85, p = 2.7 × 10-8 and cg00647063: HR= 1.20, p = 2.5 × 10-8), chr17q25 (cg16865890: HR= 0.8, p = 6.9 × 10-8), and chr11p15 (cg13738793: HR= 1.11, p = 7.7 × 10-8). The CpG sites at C1orf151, ZNF2, JPH3 and GPX6, were identified in Black adults, chr17q25 was identified in White adults, and chr11p15 was identified upon meta-analyzing the two groups. The CpG sites at JPH3 and GPX6 were likely associated with incident type 2 diabetes independent of BMI. All the CpG sites, except at JPH3, were likely consequences of elevated glucose at baseline. We additionally replicated known type 2 diabetes-associated CpG sites including cg19693031 at TXNIP, cg00574958 at CPT1A, cg16567056 at PLBC2, cg11024682 at SREBF1, cg08857797 at VPS25, and cg06500161 at ABCG1, 3 of which were replicated in Black adults at the epigenome-wide threshold. We observed modest increase in type 2 diabetes variance explained upon addition of the significantly associated CpG sites to a Cox model that included traditional type 2 diabetes risk factors and fasting glucose (increase from 26.2% to 30.5% in Black adults; increase from 36.9% to 39.4% in White adults). We examined if groups of proximal CpG sites were associated with incident type 2 diabetes using a gene-region specific and a gene-region agnostic differentially methylated region (DMR) analysis. Our DMR analyses revealed several clusters of significant CpG sites, including a DMR consisting of a previously discovered CpG site at ADCY7 and promoter regions of TP63 which were differentially methylated across all race groups. This study illustrates improved discovery of CpG sites/regions by leveraging both individual CpG site and DMR analyses in an unexplored population. Our findings include genes linked to diabetes in experimental studies (e.g., GPX6, JPH3, and TP63), and future gene-specific methylation studies could elucidate the link between genes, environment, and methylation in the pathogenesis of type 2 diabetes.
Collapse
Affiliation(s)
- Sowmya Venkataraghavan
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - James S. Pankow
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of American
| | - Eric Boerwinkle
- The UTHealth School of Public Health, Houston, Texas, United States of America
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, Texas, United States of America
| | - Elizabeth Selvin
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Welch Center for Prevention, Epidemiology, & Clinical Research, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
18
|
Cai M, Zhou J, McKennan C, Wang J. scMD: cell type deconvolution using single-cell DNA methylation references. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551733. [PMID: 37577715 PMCID: PMC10418231 DOI: 10.1101/2023.08.03.551733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
The proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create a precise cell-type signature matrix that surpasses state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Manqi Cai
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jingtian Zhou
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA
| | - Chris McKennan
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
19
|
Curd EE, Gal L, Gallego R, Nielsen S, Gold Z. rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.31.543005. [PMID: 37397980 PMCID: PMC10312559 DOI: 10.1101/2023.05.31.543005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Key to making accurate taxonomic assignments are curated, comprehensive reference barcode databases. However, the generation and curation of such databases has remained challenging given the large and continuously growing volumes of DNA sequence data and novel reference barcode targets. Monitoring and research applications require a greater diversity of specialized gene regions and targeted taxa to meet taxonomic classification goals then are currently curated by professional staff. Thus, there is a growing need for an easy to implement tool that can generate comprehensive metabarcoding reference libraries for any bespoke locus. We address this need by reimagining CRUX from the Anacapa Toolkit and present the rCRUX package in R. The typical workflow involves searching for plausible seed amplicons (get_seeds_local() or get_seeds_remote()) by simulating in silico PCR to acquire seed sequences containing a user-defined primer set. Next these seeds are used to iteratively blast search seed sequences against a local NCBI formatted database using a taxonomic rank based stratified random sampling approach (blast_seeds()) that results in a comprehensive set of sequence matches. This database is dereplicated and cleaned (derep_and_clean_db()) by identifying identical reference sequences and collapsing the taxonomic path to the lowest taxonomic agreement across all matching reads. This results in a curated, comprehensive database of primer specific reference barcode sequences from NCBI. We demonstrate that rCRUX provides more comprehensive reference databases for the MiFish Universal Teleost 12S, Taberlet trnl, and fungal ITS locus than CRABS, METACURATOR, RESCRIPt, and ECOPCR reference databases. We then further demonstrate the utility of rCRUX by generating 16 reference databases for metabarcoding loci that lack dedicated reference database curation efforts. The rCRUX package provides a simple to use tool for the generation of curated, comprehensive reference databases for user-defined loci, facilitating accurate and effective taxonomic classification of metabarcoding and DNA sequence efforts broadly.
Collapse
Affiliation(s)
- Emily E. Curd
- Vermont Biomedical Research Network, University of Vermont, VT, USA
| | - Luna Gal
- Landmark College, VT, USA
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
| | - Ramon Gallego
- Universidad Autónoma de Madrid - Unidad de Genética, Spain
| | | | - Zachary Gold
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
- NOAA Pacific Marine Environmental Laboratory, Seattle, WA, USA
| |
Collapse
|
20
|
Tian H, Xiao S, Jiang X, Tao P. PASSerRank: Prediction of Allosteric Sites with Learning to Rank. ARXIV 2023:arXiv:2302.01117v2. [PMID: 36776818 PMCID: PMC9915737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
Allostery plays a crucial role in regulating protein activity, making it a highly sought-after target in drug development. One of the major challenges in allosteric drug research is the identification of allosteric sites. In recent years, many computational models have been developed for accurate allosteric site prediction. Most of these models focus on designing a general rule that can be applied to pockets of proteins from various families. In this study, we present a new approach using the concept of Learning to Rank (LTR). The LTR model ranks pockets based on their relevance to allosteric sites, i.e., how well a pocket meets the characteristics of known allosteric sites. The model outperforms other common machine learning models with higher F1 score and Matthews correlation coefficient. After the training and validation on two datasets, the Allosteric Database (ASD) and CASBench, the LTR model was able to rank an allosteric pocket in the top 3 positions for 83.6% and 80.5% of test proteins, respectively. The trained model is available on the PASSer platform (https://passer.smu.edu) to aid in drug discovery research.
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, United States of America
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, United States of America
| | - Xi Jiang
- Department of Statistics, Southern Methodist University, Dallas, Texas, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, United States of America
| |
Collapse
|
21
|
Mao Y, Harvey WT, Porubsky D, Munson KM, Hoekzema K, Lewis AP, Audano PA, Rozanski A, Yang X, Zhang S, Gordon DS, Wei X, Logsdon GA, Haukness M, Dishuck PC, Jeong H, Del Rosario R, Bauer VL, Fattor WT, Wilkerson GK, Lu Q, Paten B, Feng G, Sawyer SL, Warren WC, Carbone L, Eichler EE. Structurally divergent and recurrently mutated regions of primate genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531415. [PMID: 36945442 PMCID: PMC10028934 DOI: 10.1101/2023.03.07.531415] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ricardo Del Rosario
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vanessa L Bauer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Will T Fattor
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E. Keeling Center for Comparative Medicine and Research, The University of Texas MD Anderson Cancer Center, Bastrop, TX, USA
- Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara L Sawyer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Department of Surgery, School of Medicine, University of Missouri, Columbia, MO, USA
- Institute of Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
22
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
23
|
Zhang J, Singh R. Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.24.525447. [PMID: 36747724 PMCID: PMC9900775 DOI: 10.1101/2023.01.24.525447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.
Collapse
Affiliation(s)
- Jiaqi Zhang
- Department of Computer Science, Brown University
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University
| |
Collapse
|
24
|
Gupta R, Kanai M, Durham TJ, Tsuo K, McCoy JG, Chinnery PF, Karczewski KJ, Calvo SE, Neale BM, Mootha VK. Nuclear genetic control of mtDNA copy number and heteroplasmy in humans. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.19.23284696. [PMID: 36711677 PMCID: PMC9882621 DOI: 10.1101/2023.01.19.23284696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Human mitochondria contain a high copy number, maternally transmitted genome (mtDNA) that encodes 13 proteins required for oxidative phosphorylation. Heteroplasmy arises when multiple mtDNA variants co-exist in an individual and can exhibit complex dynamics in disease and in aging. As all proteins involved in mtDNA replication and maintenance are nuclear-encoded, heteroplasmy levels can, in principle, be under nuclear genetic control, however this has never been shown in humans. Here, we develop algorithms to quantify mtDNA copy number (mtCN) and heteroplasmy levels using blood-derived whole genome sequences from 274,832 individuals of diverse ancestry and perform GWAS to identify nuclear loci controlling these traits. After careful correction for blood cell composition, we observe that mtCN declines linearly with age and is associated with 92 independent nuclear genetic loci. We find that nearly every individual carries heteroplasmic variants that obey two key patterns: (1) heteroplasmic single nucleotide variants are somatic mutations that accumulate sharply after age 70, while (2) heteroplasmic indels are maternally transmitted as mtDNA mixtures with resulting levels influenced by 42 independent nuclear loci involved in mtDNA replication, maintenance, and novel pathways. These nuclear loci do not appear to act by mtDNA mutagenesis, but rather, likely act by conferring a replicative advantage to specific mtDNA molecules. As an illustrative example, the most common heteroplasmy we identify is a length variant carried by >50% of humans at position m.302 within a G-quadruplex known to serve as a replication switch. We find that this heteroplasmic variant exerts cis -acting genetic control over mtDNA abundance and is itself under trans -acting genetic control of nuclear loci encoding protein components of this regulatory switch. Our study showcases how nuclear haplotype can privilege the replication of specific mtDNA molecules to shape mtCN and heteroplasmy dynamics in the human population.
Collapse
Affiliation(s)
- Rahul Gupta
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
- Broad Institute of MIT and Harvard, United States
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, United States
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
| | - Timothy J Durham
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
- Broad Institute of MIT and Harvard, United States
| | - Kristin Tsuo
- Broad Institute of MIT and Harvard, United States
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
| | - Jason G McCoy
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
- Broad Institute of MIT and Harvard, United States
| | - Patrick F Chinnery
- Department of Clinical Neurosciences & MRC Mitochondrial Biology Unit, University of Cambridge, United Kingdom
| | - Konrad J Karczewski
- Broad Institute of MIT and Harvard, United States
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
| | - Sarah E Calvo
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
- Broad Institute of MIT and Harvard, United States
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, United States
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
| | - Vamsi K Mootha
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
- Broad Institute of MIT and Harvard, United States
- Department of Systems Biology, Harvard Medical School, United States
| |
Collapse
|
25
|
Newcastle disease burden in Nepal and efficacy of Tablet I2 vaccine in commercial and backyard poultry production. PLoS One 2023; 18:e0280688. [PMID: 36897867 PMCID: PMC10004539 DOI: 10.1371/journal.pone.0280688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 01/05/2023] [Indexed: 03/11/2023] Open
Abstract
Poultry (Gallus domesticus) farming plays an important role as an income generating enterprise in a developing country like Nepal, contributing more than 4% to the national Gross Domestic Product (GDP). Newcastle Disease (ND) is a major poultry disease affecting both commercial and backyard poultry production worldwide. There were more than 90 reported ND outbreaks in Nepal in 2018 with over 74,986 birds being affected. ND is responsible for over 7% of total poultry mortality in the country. Recent outbreaks of ND in 2021 affected many farms throughout Nepal and caused massive loss in poultry production. ND is caused by a single-stranded ribonucleic acid (RNA) virus that presents very similar clinical symptoms as Influenza A (commonly known as bird flu) adding much complexity to clinical disease identification and intervention. We conducted a nationwide ND and Influenza A (IA) prevalence study, collecting samples from representative commercial and backyard poultry farms from across the major poultry production hubs of Nepal. We used both serological and molecular assessments to determine disease exposure history and identification of strains of ND Virus (NDV). Of the 40 commercial farms tested, both NDV (n = 28, 70%) and IAV (n = 11, 27.5%) antibodies were detected in majority of the samples. In the backyard farms (n = 36), sero-prevalence of NDV and IAV were 17.5% (n = 7) and 7.5% (n = 3) respectively. Genotype II NDV was present in most of the commercial farms, which was likely due to live vaccine usage. We detected never reported Genotype I NDV in two backyard farm samples. Our investigation into 2021 ND outbreak implicated Genotype VII.2 NDV strain as the causative pathogen. Additionally, we developed a Tablet formulation of the thermostable I2-NDV vaccine (Ranigoldunga™) and assessed its efficacy on various (mixed) breeds of chicken (Gallus domesticus). Ranigoldunga™ demonstrated an overall efficacy >85% with a stability of 30 days at room temperature (25°C). The intraocularly administered vaccine was highly effective in preventing ND, including Genotype VII.2 NDV strain.
Collapse
|
26
|
Kerachian MA, Azghandi M. Identification of long non-coding RNA using single nucleotide epimutation analysis: a novel gene discovery approach. Cancer Cell Int 2022; 22:337. [PMID: 36333783 PMCID: PMC9636742 DOI: 10.1186/s12935-022-02752-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 10/12/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are involved in a variety of mechanisms related to tumorigenesis by functioning as oncogenes or tumor-suppressors or even harboring oncogenic and tumor-suppressing effects; representing a new class of cancer biomarkers and therapeutic targets. It is predicted that more than 35,000 ncRNA especially lncRNA are positioned at the intergenic regions of the human genome. Emerging research indicates that one of the key pathways controlling lncRNA expression and tissue specificity is epigenetic regulation. METHODS In the current article, a novel approach for lncRNA discovery based on the intergenic position of most lncRNAs and a single CpG site methylation level representing epigenetic characteristics has been suggested. RESULTS Using this method, a novel antisense lncRNA named LINC02892 presenting three transcripts without the capacity of coding a protein was found exhibiting nuclear, cytoplasmic, and exosome distributions. CONCLUSION The current discovery strategy could be applied to identify novel non-coding RNAs influenced by methylation aberrations.
Collapse
Affiliation(s)
- Mohammad Amin Kerachian
- Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran.
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Marjan Azghandi
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
27
|
Lutteropp S, Scornavacca C, Kozlov AM, Morel B, Stamatakis A. NetRAX: accurate and fast maximum likelihood phylogenetic network inference. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 38:3725-3733. [PMID: 35713506 DOI: 10.1101/2021.08.30.458194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/11/2022] [Accepted: 06/14/2022] [Indexed: 05/26/2023]
Abstract
MOTIVATION Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE Place Eugène Bataillon, 34095 Montpellier Cedex 05, France
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| |
Collapse
|
28
|
Understanding and Modulating Antibody Fine Specificity: Lessons from Combinatorial Biology. Antibodies (Basel) 2022; 11:antib11030048. [PMID: 35892708 PMCID: PMC9326607 DOI: 10.3390/antib11030048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 02/01/2023] Open
Abstract
Combinatorial biology methods such as phage and yeast display, suitable for the generation and screening of huge numbers of protein fragments and mutated variants, have been useful when dissecting the molecular details of the interactions between antibodies and their target antigens (mainly those of protein nature). The relevance of these studies goes far beyond the mere description of binding interfaces, as the information obtained has implications for the understanding of the chemistry of antibody–antigen binding reactions and the biological effects of antibodies. Further modification of the interactions through combinatorial methods to manipulate the key properties of antibodies (affinity and fine specificity) can result in the emergence of novel research tools and optimized therapeutics.
Collapse
|
29
|
Genomic, morphological, and biochemical analyses of a multi-metal resistant but multi-drug susceptible strain of Bordetella petrii from hospital soil. Sci Rep 2022; 12:8439. [PMID: 35589928 PMCID: PMC9120033 DOI: 10.1038/s41598-022-12435-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 05/04/2022] [Indexed: 01/08/2023] Open
Abstract
Contamination of soil by antibiotics and heavy metals originating from hospital facilities has emerged as a major cause for the development of resistant microbes. We collected soil samples surrounding a hospital effluent and measured the resistance of bacterial isolates against multiple antibiotics and heavy metals. One strain BMCSI 3 was found to be sensitive to all tested antibiotics. However, it was resistant to many heavy metals and metalloids like cadmium, chromium, copper, mercury, arsenic, and others. This strain was motile and potentially spore-forming. Whole-genome shotgun assembly of BMCSI 3 produced 4.95 Mb genome with 4,638 protein-coding genes. The taxonomic and phylogenetic analysis revealed it, to be a Bordetella petrii strain. Multiple genomic islands carrying mobile genetic elements; coding for heavy metal resistant genes, response regulators or transcription factors, transporters, and multi-drug efflux pumps were identified from the genome. A comparative genomic analysis of BMCSI 3 with annotated genomes of other free-living B. petrii revealed the presence of multiple transposable elements and several genes involved in stress response and metabolism. This study provides insights into how genomic reorganization and plasticity results in evolution of heavy metals resistance by acquiring genes from its natural environment.
Collapse
|
30
|
Matos GM, Lewis MD, Talavera-López C, Yeo M, Grisard EC, Messenger LA, Miles MA, Andersson B. Microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid genome diversification. eLife 2022; 11:75237. [PMID: 35535495 PMCID: PMC9098224 DOI: 10.7554/elife.75237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 04/22/2022] [Indexed: 12/11/2022] Open
Abstract
Protozoa and fungi are known to have extraordinarily diverse mechanisms of genetic exchange. However, the presence and epidemiological relevance of genetic exchange in Trypanosoma cruzi, the agent of Chagas disease, has been controversial and debated for many years. Field studies have identified both predominantly clonal and sexually recombining natural populations. Two of six natural T. cruzi lineages (TcV and TcVI) show hybrid mosaicism, using analysis of single-gene locus markers. The formation of hybrid strains in vitro has been achieved and this provides a framework to study the mechanisms and adaptive significance of genetic exchange. Using whole genome sequencing of a set of experimental hybrids strains, we have confirmed that hybrid formation initially results in tetraploid parasites. The hybrid progeny showed novel mutations that were not attributable to either (diploid) parent showing an increase in amino acid changes. In long-term culture, up to 800 generations, there was a variable but gradual erosion of progeny genomes towards triploidy, yet retention of elevated copy number was observed at several core housekeeping loci. Our findings indicate hybrid formation by fusion of diploid T. cruzi, followed by sporadic genome erosion, but with substantial potential for adaptive evolution, as has been described as a genetic feature of other organisms, such as some fungi.
Collapse
Affiliation(s)
- Gabriel Machado Matos
- Departamento de Biologia Celular, Embriologia e Genética, Universidade Federal de Santa Catarina, Florianopolis, Brazil.,Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| | - Michael D Lewis
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Carlos Talavera-López
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden.,Institute of Computational Biology, Computational Health Centre, Helmholtz Munich, Munich, Germany
| | - Matthew Yeo
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Edmundo C Grisard
- Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de Santa Catarina, Florianopolis, Brazil
| | - Louisa A Messenger
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Michael A Miles
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Björn Andersson
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
31
|
Pashova S, Balabanski L, Elmadjian G, Savov A, Stoyanova E, Shivarov V, Petrov P, Pashov A. Restriction of the Global IgM Repertoire in Antiphospholipid Syndrome. Front Immunol 2022; 13:865232. [PMID: 35493489 PMCID: PMC9043687 DOI: 10.3389/fimmu.2022.865232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 03/21/2022] [Indexed: 11/22/2022] Open
Abstract
The typical anti-phospholipid antibodies (APLA) in the anti-phospholipid syndrome (APS) are reactive with the phospholipid-binding protein β2GPI as well as a growing list of other protein targets. The relation of APLA to natural antibodies and the fuzzy set of autoantigens involved provoked us to study the changes in the IgM repertoire in APS. To this end, peptides selected by serum IgM from a 7-residue linear peptide phage display library (PDL) were deep sequenced. The analysis was aided by a novel formal representation of the Igome (the mimotope set reflecting the IgM specificities) in the form of a sequence graph. The study involved women with APLA and habitual abortions (n=24) compared to age-matched clinically healthy pregnant women (n=20). Their pooled Igomes (297 028 mimotope sequences) were compared also to the global public repertoire Igome of pooled donor plasma IgM (n=2 796 484) and a set of 7-mer sequences found in the J regions of human immunoglobulins (n=4 433 252). The pooled Igome was represented as a graph connecting the sequences as similar as the mimotopes of the same monoclonal antibody. The criterion was based on previously published data. In the resulting graph, identifiable clusters of vertices were considered related to the footprints of overlapping antibody cross-reactivities. A subgraph based on the clusters with a significant differential expression of APS patients' mimotopes contained predominantly specificities underrepresented in APS. The differentially expressed IgM footprints showed also an increased cross-reactivity with immunoglobulin J regions. The specificities underexpressed in APS had a higher correlation with public specificities than those overexpressed. The APS associated specificities were strongly related also to the human peptidome with 1 072 mimotope sequences found in 7 519 human proteins. These regions were characterized by low complexity. Thus, the IgM repertoire of the APS patients was found to be characterized by a significant reduction of certain public specificities found in the healthy controls with targets representing low complexity linear self-epitopes homologous to human antibody J regions.
Collapse
Affiliation(s)
- Shina Pashova
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Lubomir Balabanski
- Department of Medical Genetics, Medical University-Sofia, Sofia, Bulgaria
- Genomics Laboratory, Hospital “Malinov”, Sofia, Bulgaria
| | - Gabriel Elmadjian
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Alexey Savov
- Department of Medical Genetics, Medical University-Sofia, Sofia, Bulgaria
| | - Elena Stoyanova
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | - Peter Petrov
- Institute Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Anastas Pashov
- Institute of Microbiology, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|
32
|
Shambhu S, Koundal D, Das P, Hoang VT, Tran-Trung K, Turabieh H. Computational Methods for Automated Analysis of Malaria Parasite Using Blood Smear Images: Recent Advances. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3626726. [PMID: 35449742 PMCID: PMC9017520 DOI: 10.1155/2022/3626726] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/26/2022] [Indexed: 11/18/2022]
Abstract
Malaria comes under one of the dangerous diseases in many countries. It is the primary reason for most of the causalities across the world. It is presently rated as a significant cause of the high mortality rate worldwide compared with other diseases that can be reduced significantly by its earlier detection. Therefore, to facilitate the early detection/diagnosis of malaria to reduce the mortality rate, an automated computational method is required with a high accuracy rate. This study is a solid starting point for researchers who want to look into automated blood smear analysis to detect malaria. In this paper, a comprehensive review of different computer-assisted techniques has been outlined as follows: (i) acquisition of image dataset, (ii) preprocessing, (iii) segmentation of RBC, and (iv) feature extraction and selection, and (v) classification for the detection of malaria parasites using blood smear images. This study will be helpful for: (i) researchers can inspect and improve the existing computational methods for early diagnosis of malaria with a high accuracy rate that may further reduce the interobserver and intra-observer variations; (ii) microbiologists to take the second opinion from the automated computational methods for effective diagnosis of malaria; and (iii) finally, several issues remain addressed, and future work has also been discussed in this work.
Collapse
Affiliation(s)
- Shankar Shambhu
- Chitkara University School of Computer Applications, Chitkara University, Himachal Pradesh, India
| | - Deepika Koundal
- School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| | - Prasenjit Das
- Chitkara University School of Computer Applications, Chitkara University, Himachal Pradesh, India
| | - Vinh Truong Hoang
- Faculty of Computer Science, Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
| | - Kiet Tran-Trung
- Faculty of Computer Science, Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
| | - Hamza Turabieh
- Department of Information Technology, College of Computing and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| |
Collapse
|
33
|
Furrer L, Cornelius J, Rinaldi F. Parallel sequence tagging for concept recognition. BMC Bioinformatics 2022; 22:623. [PMID: 35331131 PMCID: PMC8943923 DOI: 10.1186/s12859-021-04511-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 12/01/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence. RESULTS We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set. CONCLUSIONS Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).
Collapse
Affiliation(s)
- Lenz Furrer
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Joseph Cornelius
- Dalle Molle Institute for Artificial Intelligence Research (IDSIA USI/SUPSI), Lugano, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Fabio Rinaldi
- Dalle Molle Institute for Artificial Intelligence Research (IDSIA USI/SUPSI), Lugano, Switzerland.
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Zurich, Switzerland.
- Fondazione Bruno Kessler, Trento, Italy.
| |
Collapse
|
34
|
Kim DR, Jeon CW, Cho G, Thomashow LS, Weller DM, Paik MJ, Lee YB, Kwak YS. Glutamic acid reshapes the plant microbiota to protect plants against pathogens. MICROBIOME 2021; 9:244. [PMID: 34930485 PMCID: PMC8691028 DOI: 10.1186/s40168-021-01186-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 10/27/2021] [Indexed: 05/26/2023]
Abstract
BACKGROUND Plants in nature interact with other species, among which are mutualistic microorganisms that affect plant health. The co-existence of microbial symbionts with the host contributes to host fitness in a natural context. In turn, the composition of the plant microbiota responds to the environment and the state of the host, raising the possibility that it can be engineered to benefit the plant. However, technology for engineering the structure of the plant microbiome is not yet available. RESULTS The loss of diversity and reduction in population density of Streptomyces globisporus SP6C4, a core microbe, was observed coincident with the aging of strawberry plants. Here, we show that glutamic acid reshapes the plant microbial community and enriches populations of Streptomyces, a functional core microbe in the strawberry anthosphere. Similarly, in the tomato rhizosphere, treatment with glutamic acid increased the population sizes of Streptomyces as well as those of Bacillaceae and Burkholderiaceae. At the same time, diseases caused by species of Botrytis and Fusarium were significantly reduced in both habitats. We suggest that glutamic acid directly modulates the composition of the microbiome community. CONCLUSIONS Much is known about the structure of plant-associated microbial communities, but less is understood about how the community composition and complexity are controlled. Our results demonstrate that the intrinsic level of glutamic acid in planta is associated with the composition of the microbiota, which can be modulated by an external supply of a biostimulant. Video Abstract.
Collapse
Affiliation(s)
- Da-Ran Kim
- RILS, Gyeongsang National University, Jinju, 52828, Republic of Korea
| | - Chang-Wook Jeon
- Division of Applied Life Science (BK 21 plus) and IALS, Gyeongsang National University, Jinju, 52828, Republic of Korea
| | - Gyeongjun Cho
- Division of Applied Life Science (BK 21 plus) and IALS, Gyeongsang National University, Jinju, 52828, Republic of Korea
| | - Linda S Thomashow
- US Department of Agriculture, Agricultural Research Service, Wheat Health, Genetics and Quality Research Unit, Pullman, WA, 99164-6430, USA
| | - David M Weller
- US Department of Agriculture, Agricultural Research Service, Wheat Health, Genetics and Quality Research Unit, Pullman, WA, 99164-6430, USA
| | - Man-Jeong Paik
- College of Pharmacy, Sunchon National University, Suncheon, 65980, Republic of Korea
| | - Yong Bok Lee
- Division of Applied Life Science (BK 21 plus) and IALS, Gyeongsang National University, Jinju, 52828, Republic of Korea
| | - Youn-Sig Kwak
- RILS, Gyeongsang National University, Jinju, 52828, Republic of Korea.
- Division of Applied Life Science (BK 21 plus) and IALS, Gyeongsang National University, Jinju, 52828, Republic of Korea.
- Department of Plant Medicine, Gyeongsang National University, Jinju, 52828, Republic of Korea.
| |
Collapse
|
35
|
Riccio-Rengifo C, Finke J, Rocha C. Identifying stress responsive genes using overlapping communities in co-expression networks. BMC Bioinformatics 2021; 22:541. [PMID: 34743699 PMCID: PMC8574028 DOI: 10.1186/s12859-021-04462-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 10/26/2021] [Indexed: 11/17/2022] Open
Abstract
Background This paper proposes a workflow to identify genes that respond to specific treatments in plants. The workflow takes as input the RNA sequencing read counts and phenotypical data of different genotypes, measured under control and treatment conditions. It outputs a reduced group of genes marked as relevant for treatment response. Technically, the proposed approach is both a generalization and an extension of WGCNA. It aims to identify specific modules of overlapping communities underlying the co-expression network of genes. Module detection is achieved by using Hierarchical Link Clustering. The overlapping nature of the systems’ regulatory domains that generate co-expression can be identified by such modules. LASSO regression is employed to analyze phenotypic responses of modules to treatment. Results The workflow is applied to rice (Oryza sativa), a major food source known to be highly sensitive to salt stress. The workflow identifies 19 rice genes that seem relevant in the response to salt stress. They are distributed across 6 modules: 3 modules, each grouping together 3 genes, are associated to shoot K content; 2 modules of 3 genes are associated to shoot biomass; and 1 module of 4 genes is associated to root biomass. These genes represent target genes for the improvement of salinity tolerance in rice. Conclusions A more effective framework to reduce the search-space for target genes that respond to a specific treatment is introduced. It facilitates experimental validation by restraining efforts to a smaller subset of genes of high potential relevance.
Collapse
Affiliation(s)
- Camila Riccio-Rengifo
- Department of Natural Sciences and Mathematics, Pontificia Universidad Javeriana, Cali, Colombia.
| | - Jorge Finke
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali, Colombia
| | - Camilo Rocha
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Cali, Colombia
| |
Collapse
|
36
|
Filho JAF, Rosolen RR, Almeida DA, de Azevedo PHC, Motta MLL, Aono AH, dos Santos CA, Horta MAC, de Souza AP. Trends in biological data integration for the selection of enzymes and transcription factors related to cellulose and hemicellulose degradation in fungi. 3 Biotech 2021; 11:475. [PMID: 34777932 PMCID: PMC8548487 DOI: 10.1007/s13205-021-03032-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
Fungi are key players in biotechnological applications. Although several studies focusing on fungal diversity and genetics have been performed, many details of fungal biology remain unknown, including how cellulolytic enzymes are modulated within these organisms to allow changes in main plant cell wall compounds, cellulose and hemicellulose, and subsequent biomass conversion. With the advent and consolidation of DNA/RNA sequencing technology, different types of information can be generated at the genomic, structural and functional levels, including the gene expression profiles and regulatory mechanisms of these organisms, during degradation-induced conditions. This increase in data generation made rapid computational development necessary to deal with the large amounts of data generated. In this context, the origination of bioinformatics, a hybrid science integrating biological data with various techniques for information storage, distribution and analysis, was a fundamental step toward the current state-of-the-art in the postgenomic era. The possibility of integrating biological big data has facilitated exciting discoveries, including identifying novel mechanisms and more efficient enzymes, increasing yields, reducing costs and expanding opportunities in the bioprocess field. In this review, we summarize the current status and trends of the integration of different types of biological data through bioinformatics approaches for biological data analysis and enzyme selection.
Collapse
Affiliation(s)
- Jaire A. Ferreira Filho
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Rafaela R. Rosolen
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Deborah A. Almeida
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Paulo Henrique C. de Azevedo
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Maria Lorenza L. Motta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Alexandre H. Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Clelton A. dos Santos
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Brazilian Biorenewables National Laboratory (LNBR), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, SP Brazil
| | - Maria Augusta C. Horta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Faculty of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP Brazil
| | - Anete P. de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Department of Plant Biology, Institute of Biology, UNICAMP, Universidade Estadual de Campinas, Campinas, SP 13083-875 Brazil
| |
Collapse
|
37
|
Emenecker RJ, Griffith D, Holehouse AS. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys J 2021; 120:4312-4319. [PMID: 34480923 DOI: 10.1101/2021.05.30.446349] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/08/2021] [Accepted: 08/30/2021] [Indexed: 05/28/2023] Open
Abstract
Intrinsically disordered proteins and protein regions make up a substantial fraction of many proteomes in which they play a wide variety of essential roles. A critical first step in understanding the role of disordered protein regions in biological function is to identify those disordered regions correctly. Computational methods for disorder prediction have emerged as a core set of tools to guide experiments, interpret results, and develop hypotheses. Given the multiple different predictors available, consensus scores have emerged as a popular approach to mitigate biases or limitations of any single method. Consensus scores integrate the outcome of multiple independent disorder predictors and provide a per-residue value that reflects the number of tools that predict a residue to be disordered. Although consensus scores help mitigate the inherent problems of using any single disorder predictor, they are computationally expensive to generate. They also necessitate the installation of multiple different software tools, which can be prohibitively difficult. To address this challenge, we developed a deep-learning-based predictor of consensus disorder scores. Our predictor, metapredict, utilizes a bidirectional recurrent neural network trained on the consensus disorder scores from 12 proteomes. By benchmarking metapredict using two orthogonal approaches, we found that metapredict is among the most accurate disorder predictors currently available. Metapredict is also remarkably fast, enabling proteome-scale disorder prediction in minutes. Importantly, metapredict is a fully open source and is distributed as a Python package, a collection of command-line tools, and a web server, maximizing the potential practical utility of the predictor. We believe metapredict offers a convenient, accessible, accurate, and high-performance predictor for single-proteins and proteomes alike.
Collapse
Affiliation(s)
- Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri; Center for Science and Engineering Living Systems (CSELS), St. Louis, Missouri; Center for Engineering Mechanobiology, Washington University, St. Louis, Missouri
| | - Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri; Center for Science and Engineering Living Systems (CSELS), St. Louis, Missouri
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri; Center for Science and Engineering Living Systems (CSELS), St. Louis, Missouri.
| |
Collapse
|
38
|
Feldman J, Bals J, Altomare CG, St Denis K, Lam EC, Hauser BM, Ronsard L, Sangesland M, Moreno TB, Okonkwo V, Hartojo N, Balazs AB, Bajic G, Lingwood D, Schmidt AG. Naive human B cells engage the receptor binding domain of SARS-CoV-2, variants of concern, and related sarbecoviruses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 33594359 PMCID: PMC7885909 DOI: 10.1101/2021.02.02.429458] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Exposure to a pathogen elicits an adaptive immune response aimed to control and eradicate. Interrogating the abundance and specificity of the naive B cell repertoire contributes to understanding how to potentially elicit protective responses. Here, we isolated naive B cells from 8 seronegative human donors targeting the SARS-CoV-2 receptor-binding domain (RBD). Single B cell analysis showed diverse gene usage with no restricted complementarity determining region lengths. We show that recombinant antibodies engage SARS-CoV-2 RBD, circulating variants, and pre-emergent coronaviruses. Representative antibodies signal in a B cell activation assay and can be affinity matured through directed evolution. Structural analysis of a naive antibody in complex with spike shows a conserved mode of recognition shared with infection-induced antibodies. Lastly, both naive and affinity-matured antibodies can neutralize SARS-CoV-2. Understanding the naive repertoire may inform potential responses recognizing variants or emerging coronaviruses enabling the development of pan-coronavirus vaccines aimed at engaging germline responses. Isolation of antibody germline precursors targeting the receptor binding domain of coronaviruses.
Collapse
Affiliation(s)
- Jared Feldman
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Julia Bals
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Clara G Altomare
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Kerri St Denis
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Evan C Lam
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Blake M Hauser
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Larance Ronsard
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Maya Sangesland
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | | | - Vintus Okonkwo
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Nathania Hartojo
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | | | - Goran Bajic
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Daniel Lingwood
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA
| | - Aaron G Schmidt
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, 02139, USA.,Department of Microbiology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
39
|
Richens JL, Bramble JP, Spencer HL, Cantlay F, Butler M, O'Shea P. Towards defining the Mechanisms of Alzheimer's disease based on a contextual analysis of molecular pathways. AIMS GENETICS 2021. [DOI: 10.3934/genet.2016.1.25] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
AbstractAlzheimer's disease (AD) is posing an increasingly profound problem to society. Our genuine understanding of the pathogenesis of AD is inadequate and as a consequence, diagnostic and therapeutic strategies are currently insufficient. The understandable focus of many studies is the identification of molecules with high diagnostic utility however the opportunity to obtain a further understanding of the mechanistic origins of the disease from such putative biomarkers is often overlooked. This study examines the involvement of biomarkers in AD to shed light on potential mechanisms and pathways through which they are implicated in the pathology of this devastating neurodegenerative disorder. The computational tools required to analyse ever-growing datasets in the context of AD are also discussed.
Collapse
Affiliation(s)
- Joanna L. Richens
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
| | - Jonathan P. Bramble
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
| | - Hannah L. Spencer
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
| | - Fiona Cantlay
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
| | - Molly Butler
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
| | - Paul O'Shea
- Cell Biophysics Group, School of Life Sciences, University of Nottingham, University Park, Nottingham, United Kingdom
- Address as of 1st July 2016: Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, Canada
| |
Collapse
|
40
|
Mirsadeghi L, Haji Hosseini R, Banaei-Moghaddam AM, Kavousi K. EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer. BMC Med Genomics 2021; 14:122. [PMID: 33962648 PMCID: PMC8105935 DOI: 10.1186/s12920-021-00974-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 04/27/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. METHODS In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. RESULTS This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. CONCLUSIONS This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract.
Collapse
Affiliation(s)
- Leila Mirsadeghi
- Department of Biology, Faculty of Science, Payame Noor University, Tehran, Iran
| | - Reza Haji Hosseini
- Department of Biology, Faculty of Science, Payame Noor University, Tehran, Iran.
| | - Ali Mohammad Banaei-Moghaddam
- Laboratory of Genomics and Epigenomics (LGE), Department of Biochemistry, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
| |
Collapse
|
41
|
Takakusagi Y, Takakusagi K, Sakaguchi K, Sugawara F. Phage display technology for target determination of small-molecule therapeutics: an update. Expert Opin Drug Discov 2020; 15:1199-1211. [DOI: 10.1080/17460441.2020.1790523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Yoichi Takakusagi
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan
- Institute of Quantum Life Science (iQLS), National Institutes of Quantum and Radiological Science and Technology (QST), Chiba, Japan
| | - Kaori Takakusagi
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan
- Institute of Quantum Life Science (iQLS), National Institutes of Quantum and Radiological Science and Technology (QST), Chiba, Japan
| | - Kengo Sakaguchi
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan
| | - Fumio Sugawara
- Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan
| |
Collapse
|
42
|
Solihah B, Azhari A, Musdholifah A. Enhancement of conformational B-cell epitope prediction using CluSMOTE. PeerJ Comput Sci 2020; 6:e275. [PMID: 33816926 PMCID: PMC7924438 DOI: 10.7717/peerj-cs.275] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Accepted: 04/15/2020] [Indexed: 06/12/2023]
Abstract
BACKGROUND A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. METHODS This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. RESULT The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.
Collapse
Affiliation(s)
- Binti Solihah
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
- Department of Informatics Engineering, Universitas Trisakti, Grogol, Jakarta Barat, Indonesia
| | - Azhari Azhari
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Aina Musdholifah
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
43
|
Stafford P, Johnston SA, Kantarci OH, Zare-Shahabadi A, Warrington A, Rodriguez M. Antibody characterization using immunosignatures. PLoS One 2020; 15:e0229080. [PMID: 32196507 PMCID: PMC7083272 DOI: 10.1371/journal.pone.0229080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 01/29/2020] [Indexed: 12/02/2022] Open
Abstract
Therapeutic monoclonal antibodies have the potential to work as biological therapeutics. OKT3, Herceptin, Keytruda and others have positively impacted healthcare. Antibodies evolved naturally to provide high specificity and high affinity once mature. These characteristics can make them useful as therapeutics. However, we may be missing characteristics that are not obvious. We present a means of measuring antibodies in an unbiased manner that may highlight therapeutic activity. We propose using a microarray of random peptides to assess antibody properties. We tested twenty-four different commercial antibodies to gain some perspective about how much information can be derived from binding antibodies to random peptide libraries. Some monoclonals preferred to bind shorter peptides, some longer, some preferred motifs closer to the C-term, some nearer the N-term. We tested some antibodies with clinical activity but whose function was blinded to us at the time. We were provided with twenty-one different monoclonal antibodies, thirteen mouse and eight human IgM. These antibodies produced a variety of binding patterns on the random peptide arrays. When unblinded, the antibodies with polyspecific binding were the ones with the greatest therapeutic activity. The protein target to these therapeutic monoclonals is still unknown but using common sequence motifs from the peptides we predicted several human and mouse proteins. The same five highest proteins appeared in both mouse and human lists.
Collapse
Affiliation(s)
- Phillip Stafford
- Department of Bioinformatics, Caris Life Sciences, Phoenix, Arizona, United States of America
| | - Stephen Albert Johnston
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America
| | - Orhun H. Kantarci
- Department of Neurology, Mayo Clinic, Rochester, Minnesota, United States of America
- * E-mail:
| | - Ameneh Zare-Shahabadi
- Department of Neurology, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Arthur Warrington
- Department of Neurology, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Moses Rodriguez
- Department of Neurology, Mayo Clinic, Rochester, Minnesota, United States of America
| |
Collapse
|
44
|
He B, Dzisoo AM, Derda R, Huang J. Development and Application of Computational Methods in Phage Display Technology. Curr Med Chem 2020; 26:7672-7693. [PMID: 29956612 DOI: 10.2174/0929867325666180629123117] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 02/08/2018] [Accepted: 03/20/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND Phage display is a powerful and versatile technology for the identification of peptide ligands binding to multiple targets, which has been successfully employed in various fields, such as diagnostics and therapeutics, drug-delivery and material science. The integration of next generation sequencing technology with phage display makes this methodology more productive. With the widespread use of this technique and the fast accumulation of phage display data, databases for these data and computational methods have become an indispensable part in this community. This review aims to summarize and discuss recent progress in the development and application of computational methods in the field of phage display. METHODS We undertook a comprehensive search of bioinformatics resources and computational methods for phage display data via Google Scholar and PubMed. The methods and tools were further divided into different categories according to their uses. RESULTS We described seven special or relevant databases for phage display data, which provided an evidence-based source for phage display researchers to clean their biopanning results. These databases can identify and report possible target-unrelated peptides (TUPs), thereby excluding false-positive data from peptides obtained from phage display screening experiments. More than 20 computational methods for analyzing biopanning data were also reviewed. These methods were classified into computational methods for reporting TUPs, for predicting epitopes and for analyzing next generation phage display data. CONCLUSION The current bioinformatics archives, methods and tools reviewed here have benefitted the biopanning community. To develop better or new computational tools, some promising directions are also discussed.
Collapse
Affiliation(s)
- Bifang He
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.,School of Medicine, Guizhou University, Guiyang 550025, China
| | - Anthony Mackitz Dzisoo
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Ratmir Derda
- Department of Chemistry, University of Alberta, Edmonton T6G 2G2, Alberta, Canada
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|
45
|
Paull ML, Johnston T, Ibsen KN, Bozekowski JD, Daugherty PS. A general approach for predicting protein epitopes targeted by antibody repertoires using whole proteomes. PLoS One 2019; 14:e0217668. [PMID: 31490930 PMCID: PMC6730857 DOI: 10.1371/journal.pone.0217668] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/22/2019] [Indexed: 12/23/2022] Open
Abstract
Antibodies are essential to functional immunity, yet the epitopes targeted by antibody repertoires remain largely uncharacterized. To aid in characterization, we developed a generalizable strategy to predict antibody-binding epitopes within individual proteins and entire proteomes. Specifically, we selected antibody-binding peptides for 273 distinct sera out of a random library and identified the peptides using next-generation sequencing. To predict antibody-binding epitopes and the antigens from which these epitopes were derived, we tiled the sequences of candidate antigens into short overlapping subsequences of length k (k-mers). We used the enrichment over background of these k-mers in the antibody-binding peptide dataset to predict antibody-binding epitopes. As a positive control, we used this approach, termed K-mer Tiling of Protein Epitopes (K-TOPE), to predict epitopes targeted by monoclonal and polyclonal antibodies of well-characterized specificity, accurately recovering their known epitopes. K-TOPE characterized a commonly targeted antigen from Rhinovirus A, predicting four epitopes recognized by antibodies present in 87% of sera (n = 250). An analysis of 2,908 proteins from 400 viral taxa that infect humans predicted seven enterovirus epitopes and five Epstein-Barr virus epitopes recognized by >30% of specimens. Analysis of Staphylococcus and Streptococcus proteomes similarly predicted 22 epitopes recognized by >30% of specimens. Twelve of these common viral and bacterial epitopes agreed with previously mapped epitopes with p-values < 0.05. Additionally, we predicted 30 HSV2-specific epitopes that were 100% specific against HSV1 in novel and previously reported antigens. Experimentally validating these candidate epitopes could help identify diagnostic biomarkers, vaccine components, and therapeutic targets. The K-TOPE approach thus provides a powerful new tool to elucidate the organisms, antigens, and epitopes targeted by human antibody repertoires.
Collapse
Affiliation(s)
- Michael L. Paull
- Department of Chemical Engineering, University of California Santa Barbara, California, United States of America
- * E-mail: (MLP); (PSD)
| | - Tim Johnston
- Department of Chemical Engineering, University of California Santa Barbara, California, United States of America
| | - Kelly N. Ibsen
- Department of Chemical Engineering, University of California Santa Barbara, California, United States of America
| | - Joel D. Bozekowski
- Department of Chemical Engineering, University of California Santa Barbara, California, United States of America
| | - Patrick S. Daugherty
- Department of Chemical Engineering, University of California Santa Barbara, California, United States of America
- * E-mail: (MLP); (PSD)
| |
Collapse
|
46
|
Liang W, Zheng Y, Zhang J, Sun X. Multiscale modeling reveals angiogenesis-induced drug resistance in brain tumors and predicts a synergistic drug combination targeting EGFR and VEGFR pathways. BMC Bioinformatics 2019; 20:203. [PMID: 31074391 PMCID: PMC6509865 DOI: 10.1186/s12859-019-2737-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Experimental studies have demonstrated that both the extracellular vasculature or microenvironment and intracellular molecular network (e.g., epidermal growth factor receptor (EGFR) signaling pathway) are important for brain tumor growth. Additionally, some drugs have been developed to inhibit EGFR signaling pathways. However, how angiogenesis affects the response of tumor cells to drug treatment has rarely been mechanistically studied. Therefore, a multiscale model is required to investigate such complex biological systems that contain interactions and feedback among multiple levels. RESULTS In this study, we developed a single cell-based multiscale spatiotemporal model to simulate vascular tumor growth and the drug response based on the vascular endothelial growth factor receptor (VEGFR) signaling pathway, the EGFR signaling pathway and the cell cycle as well as several microenvironmental factors that determine cell fate switches in a temporal and spatial context. By incorporating the EGFRI treatment effect, the model showed an interesting phenomenon in which the survival rate of tumor cells decreased in the early stage but rebounded in a later stage, revealing the emergence of drug resistance. Moreover, we revealed the critical role of angiogenesis in acquired drug resistance, since inhibiting blood vessel growth using a VEGFR inhibitor prevented the recovery of the survival rate of tumor cells in the later stage. We further investigated the optimal timing of combining VEGFR inhibition with EGFR inhibition and predicted that the drug combination targeting both the EGFR pathway and VEGFR pathway has a synergistic effect. The experimental data validated the prediction of drug synergy, confirming the effectiveness of our model. In addition, the combination of EGFR and VEGFR genes showed clinical relevance in glioma patients. CONCLUSIONS The developed multiscale model revealed angiogenesis-induced drug resistance mechanisms of brain tumors to EGFRI treatment and predicted a synergistic drug combination targeting both EGFR and VEGFR pathways with optimal combination timing. This study explored the mechanistic and functional mechanisms of the angiogenesis underlying tumor growth and drug resistance, which advances our understanding of novel mechanisms of drug resistance and provides implications for designing more effective cancer therapies.
Collapse
Affiliation(s)
- Weishan Liang
- Zhong-shan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China.,Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Chinese Ministry of Education, Guangzhou, 510080, China.,School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Yongjiang Zheng
- Department of Hematology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Ji Zhang
- Department of Neurosurgery, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Collaborative Innovation Center for Cancer Medicine, Guangzhou, 510275, China
| | - Xiaoqiang Sun
- Zhong-shan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China. .,Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Chinese Ministry of Education, Guangzhou, 510080, China. .,School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China.
| |
Collapse
|
47
|
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019; 19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open
Abstract
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wei Zheng
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
48
|
Liu W, Cheng C, Chen F, Ni S, Lin Y, Lai Z. High-throughput sequencing of small RNAs revealed the diversified cold-responsive pathways during cold stress in the wild banana (Musa itinerans). BMC PLANT BIOLOGY 2018; 18:308. [PMID: 30486778 PMCID: PMC6263057 DOI: 10.1186/s12870-018-1483-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 10/15/2018] [Indexed: 05/16/2023]
Abstract
BACKGROUND Cold stress is one of the most severe abiotic stresses affecting the banana production. Although some miRNAs have been identified, little is known about the role of miRNAs in response to cold stress in banana, and up to date, there is no report about the role of miRNAs in the response to cold stress in the plants of the cultivated or wild bananas. RESULT Here, a cold-resistant line wild banana (Musa itinerans) from China was used to profile the cold-responsive miRNAs by RNA-seq during cold stress. Totally, 265 known mature miRNAs and 41 novel miRNAs were obtained. Cluster analysis of differentially expressed (DE) miRNAs indicated that some miRNAs were specific for chilling or 0 °C treated responses, and most of them were reported to be cold-responsive; however, some were seldom reported to be cold-responsive in response to cold stress, e.g., miR395, miR408, miR172, suggesting that they maybe play key roles in response to cold stress. The GO and KEGG pathway enrichment analysis of DE miRNAs targets indicated that there existed diversified cold-responsive pathways, and miR172 was found likely to play a central coordinating role in response to cold stress, especially in the regulation of CK2 and the circadian rhythm. Finally, qPCR assays indicated the related targets were negatively regulated by the tested DE miRNAs during cold stress in the wild banana. CONCLUSIONS In this study, the profiling of miRNAs by RNA-seq in response to cold stress in the plants of the wild banana (Musa itinerans) was reported for the first time. The results showed that there existed diversified cold-responsive pathways, which provided insight into the roles of miRNAs during cold stress, and would be helpful for alleviating cold stress and cold-resistant breeding in bananas.
Collapse
Affiliation(s)
- Weihua Liu
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
- Chongqing Normal University, Daxuecheng Middle Rd, Chongqing, Shapingba Qu China
| | - Chunzhen Cheng
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| | - Fanglan Chen
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| | - Shanshan Ni
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| | - Yuling Lin
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| | - Zhongxiong Lai
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 China
| |
Collapse
|
49
|
Madi MK, Karameh FN. Adaptive optimal input design and parametric estimation of nonlinear dynamical systems: application to neuronal modeling. J Neural Eng 2018; 15:046028. [PMID: 29749350 DOI: 10.1088/1741-2552/aac3f7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Many physical models of biological processes including neural systems are characterized by parametric nonlinear dynamical relations between driving inputs, internal states, and measured outputs of the process. Fitting such models using experimental data (data assimilation) is a challenging task since the physical process often operates in a noisy, possibly non-stationary environment; moreover, conducting multiple experiments under controlled and repeatable conditions can be impractical, time consuming or costly. The accuracy of model identification, therefore, is dictated principally by the quality and dynamic richness of collected data over single or few experimental sessions. Accordingly, it is highly desirable to design efficient experiments that, by exciting the physical process with smart inputs, yields fast convergence and increased accuracy of the model. APPROACH We herein introduce an adaptive framework in which optimal input design is integrated with square root cubature Kalman filters (OID-SCKF) to develop an online estimation procedure that first, converges significantly quicker, thereby permitting model fitting over shorter time windows, and second, enhances model accuracy when only few process outputs are accessible. The methodology is demonstrated on common nonlinear models and on a four-area neural mass model with noisy and limited measurements. Estimation quality (speed and accuracy) is benchmarked against high-performance SCKF-based methods that commonly employ dynamically rich informed inputs for accurate model identification. MAIN RESULTS For all the tested models, simulated single-trial and ensemble averages showed that OID-SCKF exhibited (i) faster convergence of parameter estimates and (ii) lower dependence on inter-trial noise variability with gains up to around 1000 ms in speed and 81% increase in variability for the neural mass models. In terms of accuracy, OID-SCKF estimation was superior, and exhibited considerably less variability across experiments, in identifying model parameters of (a) systems with challenging model inversion dynamics and (b) systems with fewer measurable outputs that directly relate to the underlying processes. SIGNIFICANCE Fast and accurate identification therefore carries particular promise for modeling of transient (short-lived) neuronal network dynamics using a spatially under-sampled set of noisy measurements, as is commonly encountered in neural engineering applications.
Collapse
Affiliation(s)
- Mahmoud K Madi
- Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
| | | |
Collapse
|
50
|
Shang RP, Wang W. Investigating Dysregulated Pathways in Dilated Cardiomyopathy from Pathway Interaction Network. RUSS J GENET+ 2018. [DOI: 10.1134/s1022795418020151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|