1
|
Chou L, Zhang S, Luo W, Zhu W, Guo J, Tu K, Tan H, Wang C, Wei S, Yu H, Zhang X, Shi W. Identification of Key Toxic Substances Considering Metabolic Activation: A Combination of Transcriptome and Nontarget Analysis. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:14831-14842. [PMID: 39120612 DOI: 10.1021/acs.est.4c03683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
There have been numerous studies using effect-directed analysis (EDA) to identify key toxic substances present in source and drinking water, but none of these studies have considered the effects of metabolic activation. This study developed a comprehensive method including a pretreatment process based on an in vitro metabolic activation system, a comprehensive biological effect evaluation based on concentration-dependent transcriptome (CDT), and a chemical feature identification based on nontarget chemical analysis (NTA), to evaluate the changes in the toxic effects and differences in the chemical composition after metabolism. Models for matching metabolites and precursors as well as data-driven identification methods were further constructed to identify toxic metabolites and key toxic precursor substances in drinking water samples from the Yangtze River. After metabolism, the metabolic samples showed a general trend of reduced toxicity in terms of overall biological potency (mean: 3.2-fold). However, metabolic activation led to an increase in some types of toxic effects, including pathways such as excision repair, mismatch repair, protein processing in endoplasmic reticulum, nucleotide excision repair, and DNA replication. Meanwhile, metabolic samples showed a decrease (17.8%) in the number of peaks and average peak area after metabolism, while overall polarity, hydrophilicity, and average molecular weight increased slightly (10.3%). Based on the models for matching of metabolites and precursors and the data-driven identification methods, 32 chemicals were efficiently identified as key toxic substances as main contributors to explain the different transcriptome biological effects such as cellular component, development, and DNA damage related, including 15 industrial compounds, 7 PPCPs, 6 pesticides, and 4 natural products. This study avoids the process of structure elucidation of toxic metabolites and can trace them directly to the precursors based on MS spectra, providing a new idea for the identification of key toxic pollutants of metabolites.
Collapse
Affiliation(s)
- Liben Chou
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Shaoqing Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Wenrui Luo
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Wenxuan Zhu
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota 55105, United States
| | - Jing Guo
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Keng Tu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Haoyue Tan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Chang Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, Institute of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Si Wei
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
- Jiangsu Province Ecology and Environment Protection Key Laboratory of Chemical Safety and Health Risk, Nanjing 210023, China
| | - Hongxia Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
- Jiangsu Province Ecology and Environment Protection Key Laboratory of Chemical Safety and Health Risk, Nanjing 210023, China
| | - Xiaowei Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
- Jiangsu Province Ecology and Environment Protection Key Laboratory of Chemical Safety and Health Risk, Nanjing 210023, China
| | - Wei Shi
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
- Jiangsu Province Ecology and Environment Protection Key Laboratory of Chemical Safety and Health Risk, Nanjing 210023, China
| |
Collapse
|
2
|
Lin J, Chen X, Liu Y, Wang Y, Shuai J, Chen M. Fe/Mn (oxyhydr)oxides reductive dissolution promoted by cyanobacterial algal bloom-derived dissolved organic matter caused sediment W release during an algal bloom in Taihu Lake. WATER RESEARCH 2024; 260:121899. [PMID: 38908314 DOI: 10.1016/j.watres.2024.121899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/29/2024] [Accepted: 06/05/2024] [Indexed: 06/24/2024]
Abstract
Tungsten (W) can be toxic to aquatic organisms. However, the spatiotemporal characteristics and controlling factors of W mobility during harmful algal blooms (HABs) have rarely been investigated. In this study, simultaneous changes in soluble W, iron (Fe), manganese (Mn), and ultraviolet absorbance (UV254) in the sediment-water interface (SWI) were measured monthly using high-resolution peeper (HR-Peeper) devices. Laboratory experiments were conducted to verify the effects of environmental factors on W release. From May 2021 to October 2021, the concentration and flux of soluble W were higher than in other months. In addition, from May to October, DMAX (the depth at which the maximum concentration occurs on each profile) was 30-50 mm below the SWI, rather than the maximum depth. Principal component analysis (PCA) also divided the year into two periods, designated W-stable (December 2020, January, March, April and November 2021 with low soluble W concentration) and W-active periods (from May 2021 to October 2021 with high soluble W concentration). Laboratory experiments showed that both warming and anoxic conditions caused simultaneous release of soluble W, Fe(II), Mn, and dissolved organic matter (DOM), with strong correlations among soluble W, Fe(II), Mn. Partial least squares path modeling (PLS-PM) and random forest model showed that DOM directly affected W release or indirectly affected W release through promoting ferromanganese (oxyhydr)oxides reduction under warming and anaerobic conditions. The results of the field investigation showed that, in the W-stable period with low T, high DO, and an oxic SWI, the concentrations of soluble W, Fe, Mn, and DOM were low. The redundancy analysis (RDA) showed that these months were mainly affected by water DO. The significant and strong positive correlation among soluble W, Fe and Mn indicated that soluble W was probably scavenged by Fe/Mn (oxyhydr)oxides in the oxic water during the W-stable period. The W-active period corresponded to the cyanobacterial HABs (cyanoHABs) outbreak, with higher T, lower DO, and a more anoxic SWI. During this period, the concentrations of soluble W, Fe, Mn, and DOM were high and their correlations were stronger. RDA showed that these months were mainly affected by T, UV254, soluble Fe and Mn. These results indicated that reductive dissolution of Fe/Mn (oxyhydr)oxides driven by DOM generated in W-active period, especially cyanoHAB-derived DOM, mainly caused soluble W release. These results reveal the coupling relationship between cyanoHABs and W release and emphasize the need for prevention and control of heavy metal release in eutrophic lakes.
Collapse
Affiliation(s)
- Juan Lin
- School of Geographic Science, Nantong University, Nantong, 226000, China
| | - Xiang Chen
- Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing, 210010, China
| | - Yvlu Liu
- School of Geographic Science, Nantong University, Nantong, 226000, China
| | - Yibo Wang
- School of Geographic Science, Nantong University, Nantong, 226000, China
| | - Jinxia Shuai
- School of Geographic Science, Nantong University, Nantong, 226000, China
| | - Musong Chen
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008, China.
| |
Collapse
|
3
|
Jilani M, Degras D, Haspel N. Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes (Basel) 2024; 15:631. [PMID: 38790260 PMCID: PMC11121157 DOI: 10.3390/genes15050631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
Advancements in the field of next generation sequencing (NGS) have generated vast amounts of data for the same set of subjects. The challenge that arises is how to combine and reconcile results from different omics studies, such as epigenome and transcriptome, to improve the classification of disease subtypes. In this study, we introduce sCClust (sparse canonical correlation analysis with clustering), a technique to combine high-dimensional omics data using sparse canonical correlation analysis (sCCA), such that the correlation between datasets is maximized. This stage is followed by clustering the integrated data in a lower-dimensional space. We apply sCClust to gene expression and DNA methylation data for three cancer genomics datasets from the Cancer Genome Atlas (TCGA) to distinguish between underlying subtypes. We evaluate the identified subtypes using Kaplan-Meier plots and hazard ratio analysis on the three types of cancer-GBM (glioblastoma multiform), lung cancer and colon cancer. Comparison with subtypes identified by both single- and multi-omics studies implies improved clinical association. We also perform pathway over-representation analysis in order to identify up-regulated and down-regulated genes as tentative drug targets. The main goal of the paper is twofold: the integration of epigenomic and transcriptomic datasets followed by elucidating subtypes in the latent space. The significance of this study lies in the enhanced categorization of cancer data, which is crucial to precision medicine.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| | - David Degras
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| |
Collapse
|
4
|
Wróbel S, Turek C, Stępień E, Piwowar M. Data integration through canonical correlation analysis and its application to OMICs research. J Biomed Inform 2024; 151:104575. [PMID: 38086443 DOI: 10.1016/j.jbi.2023.104575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/08/2023] [Indexed: 02/23/2024]
Abstract
The subject of the paper is a review of multidimensional data analysis methods, which is the canonical analysis with its various variants and its use in omics data research. The dynamic development of high-throughput methods, and with them the availability of large and constantly growing data resources, forces the development of new analytical approaches that allow the review of the analyzed processes, taking into account data from various levels of the organization of living organisms. The multidimensional perspective allows for the assessment of the analyzed phenomenon in a more realistic way, as it generally takes into account much more data (including OMICs data). Without omitting the complexity of an organism, the method simplifies the multidimensional view, finally giving the result so that the researcher can draw practical conclusions. This is particularly important in medical sciences, where the study of pathological processes is usually aimed at developing treatment regimens. One of the primary methods for studying biomedical processes in a multidimensional approach is the canonical correlation analysis (CCA) with various variants. The use of CCA unique methodologies for simultaneous analysis of multiset biomolecular data opens up new avenues for studying previously undiscovered processes and interdependencies such as e.g. in the tumor microenvironment (TME) connected to intercellular communication. Because of the huge and still untapped potential of canonical correlation, in this review available implementations of CCA techniques are presented. In particular, the possibility of using the technique of canonical correlation analysis for OMICs data is emphasized.
Collapse
Affiliation(s)
- Sonia Wróbel
- Department of Medical Physics, Jagiellonian University, Marian Smoluchowski Institute of Physics, Krakow, Poland
| | - Cezary Turek
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Krakow, Poland
| | - Ewa Stępień
- Department of Medical Physics, Jagiellonian University, Marian Smoluchowski Institute of Physics, Krakow, Poland; Center for Theranostics, Jagiellonian University ul. Kopernika 40, 31-034 Kraków, Poland; Total-Body Jagiellonian-PET Laboratory, Jagiellonian University, Kraków, Poland.
| | - Monika Piwowar
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Krakow, Poland.
| |
Collapse
|
5
|
Palarea-Albaladejo J, McNeilly TN, Nisbet AJ. A curated multivariate approach to study efficacy and optimisation of a prototype vaccine against teladorsagiasis in sheep. Vet Res Commun 2024; 48:367-379. [PMID: 37707655 PMCID: PMC10810991 DOI: 10.1007/s11259-023-10208-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 08/25/2023] [Indexed: 09/15/2023]
Abstract
This work discusses and demonstrates the novel use of multivariate analysis and data dimensionality reduction techniques to handle the variety and complexity of data generated in efficacy trials for the development of a prototype vaccine to protect sheep against the Teladorsagia circumcincta nematode. A curated collection of data dimension reduction and visualisation techniques, in conjunction with sensible statistical modelling and testing which explicitly model key features of the data, offers a synthetic view of the relationships between the multiple biological parameters measured. New biological insight is gained into the patterns and associations involving antigen-specific antibody levels, antibody avidity and parasitological parameters of efficacy that is not achievable by standard statistical practice in the field. This approach can therefore be used to guide vaccine refinement and simplification through identifying the most immunologically relevant antigens, and it can be analogously implemented for similar studies in other areas. To facilitate this, the associated data and computer codes written for the R open system for statistical computing are made freely available.
Collapse
Affiliation(s)
- Javier Palarea-Albaladejo
- Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain.
- Biomathematics and Statistics Scotland, JCMB, The King's Buildings, Peter Guthrie Tait Road, Edinburgh, Scotland, UK.
| | - Tom N McNeilly
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, Scotland, UK
| | - Alasdair J Nisbet
- Moredun Research Institute, Pentlands Science Park, Bush Loan, Penicuik, Scotland, UK
| |
Collapse
|
6
|
Li Q, Zhang L, Shan H, Yu J, Dai Y, He H, Li WG, Langley C, Sahakian BJ, Yao Y, Luo Q, Li F. The immuno-behavioural covariation associated with the treatment response to bumetanide in young children with autism spectrum disorder. Transl Psychiatry 2022; 12:228. [PMID: 35660740 PMCID: PMC9166783 DOI: 10.1038/s41398-022-01987-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 05/21/2022] [Accepted: 05/25/2022] [Indexed: 11/09/2022] Open
Abstract
Bumetanide, a drug being studied in autism spectrum disorder (ASD) may act to restore gamma-aminobutyric acid (GABA) function, which may be modulated by the immune system. However, the interaction between bumetanide and the immune system remains unclear. Seventy-nine children with ASD were analysed from a longitudinal sample for a 3-month treatment of bumetanide. The covariation between symptom improvements and cytokine changes was calculated and validated by sparse canonical correlation analysis. Response patterns to bumetanide were revealed by clustering analysis. Five classifiers were used to test whether including the baseline information of cytokines could improve the prediction of the response patterns using an independent test sample. An immuno-behavioural covariation was identified between symptom improvements in the Childhood Autism Rating Scale (CARS) and the cytokine changes among interferon (IFN)-γ, monokine induced by gamma interferon and IFN-α2. Using this covariation, three groups with distinct response patterns to bumetanide were detected, including the best (21.5%, n = 17; Hedge's g of improvement in CARS = 2.16), the least (22.8%, n = 18; g = 1.02) and the medium (55.7%, n = 44; g = 1.42) responding groups. Including the cytokine levels significantly improved the prediction of the best responding group before treatment (the best area under the curve, AUC = 0.832) compared with the model without the cytokine levels (95% confidence interval of the improvement in AUC was [0.287, 0.319]). Cytokine measurements can help in identifying possible responders to bumetanide in ASD children, suggesting that immune responses may interact with the mechanism of action of bumetanide to enhance the GABA function in ASD.
Collapse
Affiliation(s)
- Qingyang Li
- Department of Computational Biology, School of Life Sciences, Fudan University, 200438, Shanghai, China
| | - Lingli Zhang
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
| | - Haidi Shan
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
| | - Juehua Yu
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
- Center for Experimental Studies and Research, The First Affiliated Hospital of Kunming Medical University, 650032, Kunming, China
| | - Yuan Dai
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
| | - Hua He
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
| | - Wei-Guang Li
- Collaborative Innovation Center for Brain Science, Department of Anatomy and Physiology, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China
| | - Christelle Langley
- Department of Psychiatry and the Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, CB21TN, UK
| | - Barbara J Sahakian
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China
- Department of Psychiatry and the Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, CB21TN, UK
- National Clinical Research Center for Aging and Medicine at Huashan Hospital, State Key Laboratory of Medical Neurobiology and Ministry of Education Frontiers Center for Brain Science, Institutes of Brain Science and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 200433, Shanghai, China
| | - Yin Yao
- Department of Computational Biology, School of Life Sciences, Fudan University, 200438, Shanghai, China
- Human Phenome Institute, Fudan University, 201203, Shanghai, China
| | - Qiang Luo
- National Clinical Research Center for Aging and Medicine at Huashan Hospital, State Key Laboratory of Medical Neurobiology and Ministry of Education Frontiers Center for Brain Science, Institutes of Brain Science and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, 200433, Shanghai, China.
- Human Phenome Institute, Fudan University, 201203, Shanghai, China.
- Center for Computational Psychiatry, Ministry of Education-Key Laboratory of Computational Neuroscience and Brain-Inspired, Research Institute of Intelligent Complex Systems, Fudan University, 200040, Shanghai, China.
| | - Fei Li
- Department of Developmental and Behavioural Pediatric & Child Primary Care, Brain and Behavioural Research Unit of Shanghai Institute for Pediatric Research and MOE-Shanghai Key Laboratory for Children's Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, 200092, Shanghai, China.
| |
Collapse
|
7
|
Capblancq T, Forester BR. Redundancy analysis: A Swiss Army Knife for landscape genomics. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13722] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
8
|
Guo X, Song Y, Liu S, Gao M, Qi Y, Shang X. Linking genotype to phenotype in multi-omics data of small sample. BMC Genomics 2021; 22:537. [PMID: 34256701 PMCID: PMC8278664 DOI: 10.1186/s12864-021-07867-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 06/30/2021] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. RESULTS To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. CONCLUSIONS We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.
Collapse
Affiliation(s)
- Xinpeng Guo
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
- School of Air and Missile Defense, Air Force Engineering University, Xi'an, 710051, People's Republic of China
| | - Yafei Song
- School of Air and Missile Defense, Air Force Engineering University, Xi'an, 710051, People's Republic of China
| | - Shuhui Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Meihong Gao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Yang Qi
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China.
| |
Collapse
|
9
|
Kobak D, Bernaerts Y, Weis MA, Scala F, Tolias AS, Berens P. Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12494] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Dmitry Kobak
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
| | - Yves Bernaerts
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
- International Max Planck Research School for Intelligent Systems Germany
| | - Marissa A. Weis
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
| | - Federico Scala
- Department of Neuroscience Baylor College of Medicine Houston Texas USA
| | - Andreas S. Tolias
- Department of Neuroscience Baylor College of Medicine Houston Texas USA
| | - Philipp Berens
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
- Department of Computer Science University of Tübingen Tübingen Germany
| |
Collapse
|
10
|
Baekelandt S, Cornet V, Mandiki SNM, Lambert J, Dubois M, Kestemont P. Ex vivo approach supports both direct and indirect actions of melatonin on immunity in pike-perch Sander lucioperca. FISH & SHELLFISH IMMUNOLOGY 2021; 112:143-150. [PMID: 33741521 DOI: 10.1016/j.fsi.2021.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 03/01/2021] [Accepted: 03/12/2021] [Indexed: 06/12/2023]
Abstract
The melatonin hormone, which is a multifunctional molecule in vertebrates, has been shown to exert complex actions on the immune system of mammals. In teleosts, the immunomodulatory capacity of this hormone has seldom been investigated. In the present experiment, we exposed ex vivo spleen and head kidney tissues of pike-perch to melatonin (Mel) and cortisol (Cort). We applied three concentrations of both hormones, alone and in combination, namely (1) Mel (10, 100 or 1000 pg mL-1) (2) Cort (50, 500 or 5000 ng mL-1) (3) Mel + Cort (10 + 50, 100 + 500 or 1000 pg mL-1+5000 ng mL-1). Pure medium without Mel or Cort served as control. After 15 h of incubation, we assessed the expression of a set of immunity-related genes, including genes encoding for pro-inflammatory proteins (il-1β, cxcl8 and tnf-α), acute-phase proteins (fgl2, fth1, hepc, hp and saa1) and key factors of the adaptive immune system (fκbp4 and tcrg). Both Mel and Cort, when used alone or combined at physiological concentrations, significantly influenced immune gene expressions that may lead to a global immune stimulation. Our results support both, an indirect action of the Mel hormone on the immune system through the regulation of intermediates such as Cort, as well as a direct action on immune targets through specific receptors.
Collapse
Affiliation(s)
- Sébastien Baekelandt
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium.
| | - Valérie Cornet
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium
| | - Syaghalirwa N M Mandiki
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium
| | - Jérôme Lambert
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium
| | - Mickaël Dubois
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium
| | - Patrick Kestemont
- Research Unit in Environmental and Evolutionary Biology (URBE), Institute of Life, Earth & Environment, University of Namur, Rue de Bruxelles 61, B-5000, Belgium
| |
Collapse
|
11
|
Tozzo V, Azencott CA, Fiorini S, Fava E, Trucco A, Barla A. Where Do We Stand in Regularization for Life Science Studies? J Comput Biol 2021; 29:213-232. [PMID: 33926217 PMCID: PMC8968832 DOI: 10.1089/cmb.2019.0371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
More and more biologists and bioinformaticians turn to machine learning to analyze large amounts of data. In this context, it is crucial to understand which is the most suitable data analysis pipeline for achieving reliable results. This process may be challenging, due to a variety of factors, the most crucial ones being the data type and the general goal of the analysis (e.g., explorative or predictive). Life science data sets require further consideration as they often contain measures with a low signal-to-noise ratio, high-dimensional observations, and relatively few samples. In this complex setting, regularization, which can be defined as the introduction of additional information to solve an ill-posed problem, is the tool of choice to obtain robust models. Different regularization practices may be used depending both on characteristics of the data and of the question asked, and different choices may lead to different results. In this article, we provide a comprehensive description of the impact and importance of regularization techniques in life science studies. In particular, we provide an intuition of what regularization is and of the different ways it can be implemented and exploited. We propose four general life sciences problems in which regularization is fundamental and should be exploited for robustness. For each of these large families of problems, we enumerate different techniques as well as examples and case studies. Lastly, we provide a unified view of how to approach each data type with various regularization techniques.
Collapse
Affiliation(s)
- Veronica Tozzo
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| | - Chloé-Agathe Azencott
- Centre for Computational Biology-CBIO, MINES ParisTech, PSL Research University, Paris, France.,Institut Curie, PSL Research University, Paris, France.,INSERM, U900, Paris, France
| | | | - Emanuele Fava
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Andrea Trucco
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| |
Collapse
|
12
|
Huo J, Ma Y, Lu C, Li C, Duan K, Li H. Mahalanobis distance based similarity regression learning of NIRS for quality assurance of tobacco product with different variable selection methods. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 251:119364. [PMID: 33493932 DOI: 10.1016/j.saa.2020.119364] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/13/2020] [Accepted: 12/17/2020] [Indexed: 06/12/2023]
Abstract
Quality assurance is one of the key issues in tobacco industry and many efforts have been put on the quality control. This paper introduces a new chemometrics technique to estimate the "quality similarity rate", which is used for quality control. The value of the quality similarity rate represents the similarity degree between the products and the standard reference samples, which is a global parameter that can be generated by either human assessors or machine learning. Supervised similarity regression models are built to automatically estimate the quality similarity rate value from NIRS data of tobacco leaf and smoke. For the similarity regression learning, the metric matrix is generated by a novel method which calculates the Mahalanobis distance from the segmented near infrared spectroscopy (NIRS). The results show the similarity regression learning can predict the quality similarity score well in high speed and can be improved with lasso (least absolute shrinkage and selection operator) related feature selection algorithms such as sRDA (sparse redundancy analysis) and glmnet.
Collapse
Affiliation(s)
- Juan Huo
- Zhengzhou University, Henan Province, China.
| | - Yuping Ma
- China Tobacco Henan Industrial Co., Ltd, Zhengzhou 450000, China
| | - Changtong Lu
- China Tobacco Henan Industrial Co., Ltd, Zhengzhou 450000, China
| | - Chenggang Li
- China Tobacco Henan Industrial Co., Ltd, Zhengzhou 450000, China
| | - Kun Duan
- China Tobacco Henan Industrial Co., Ltd, Zhengzhou 450000, China
| | - Huaiqi Li
- China Tobacco Henan Industrial Co., Ltd, Zhengzhou 450000, China.
| |
Collapse
|
13
|
Ajmal HB, Madden MG. Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractOver a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ($n{< }{< }p$). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Hamda B. Ajmal
- School of Computer Science, National University of Ireland, Galway, Ireland
| | - Michael G. Madden
- School of Computer Science, National University of Ireland, Galway, Ireland
| |
Collapse
|
14
|
Csala A, Zwinderman AH, Hof MH. Multiset sparse partial least squares path modeling for high dimensional omics data analysis. BMC Bioinformatics 2020; 21:9. [PMID: 31918677 PMCID: PMC6953292 DOI: 10.1186/s12859-019-3286-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 11/20/2019] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Recent technological developments have enabled the measurement of a plethora of biomolecular data from various omics domains, and research is ongoing on statistical methods to leverage these omics data to better model and understand biological pathways and genetic architectures of complex phenotypes. Current reviews report that the simultaneous analysis of multiple (i.e. three or more) high dimensional omics data sources is still challenging and suitable statistical methods are unavailable. Often mentioned challenges are the lack of accounting for the hierarchical structure between omics domains and the difficulty of interpretation of genomewide results. This study is motivated to address these challenges. We propose multiset sparse Partial Least Squares path modeling (msPLS), a generalized penalized form of Partial Least Squares path modeling, for the simultaneous modeling of biological pathways across multiple omics domains. msPLS simultaneously models the effect of multiple molecular markers, from multiple omics domains, on the variation of multiple phenotypic variables, while accounting for the relationships between data sources, and provides sparse results. The sparsity in the model helps to provide interpretable results from analyses of hundreds of thousands of biomolecular variables. RESULTS With simulation studies, we quantified the ability of msPLS to discover associated variables among high dimensional data sources. Furthermore, we analysed high dimensional omics datasets to explore biological pathways associated with Marfan syndrome and with Chronic Lymphocytic Leukaemia. Additionally, we compared the results of msPLS to the results of Multi-Omics Factor Analysis (MOFA), which is an alternative method to analyse this type of data. CONCLUSIONS msPLS is an multiset multivariate method for the integrative analysis of multiple high dimensional omics data sources. It accounts for the relationship between multiple high dimensional data sources while it provides interpretable results through its sparse solutions. The biomarkers found by msPLS in the omics datasets can be interpreted in terms of biological pathways associated with the pathophysiology of Marfan syndrome and of Chronic Lymphocytic Leukaemia. Additionally, msPLS outperforms MOFA in terms of variation explained in the chronic lymphocytic leukaemia dataset while it identifies the two most important clinical markers for Chronic Lymphocytic Leukaemia AVAILABILITY: http://uva.csala.me/mspls.https://github.com/acsala/2018_msPLS.
Collapse
Affiliation(s)
- Attila Csala
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Aeilko H. Zwinderman
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Michel H. Hof
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| |
Collapse
|
15
|
Csala A, Hof MH, Zwinderman AH. Multiset sparse redundancy analysis for high-dimensional omics data. Biom J 2018; 61:406-423. [PMID: 30506971 PMCID: PMC6587877 DOI: 10.1002/bimj.201700248] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 09/28/2018] [Accepted: 10/02/2018] [Indexed: 11/23/2022]
Abstract
Redundancy Analysis (RDA) is a well‐known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high‐dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi‐sRDA) framework is a prominent candidate for high‐dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi‐sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.
Collapse
Affiliation(s)
- Attila Csala
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands
| | - Michel H Hof
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands
| | - Aeilko H Zwinderman
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
16
|
Using a Backpropagation Artificial Neural Network to Predict Nutrient Removal in Tidal Flow Constructed Wetlands. WATER 2018. [DOI: 10.3390/w10010083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|