1
|
Karpov OA, Stotland A, Raedschelders K, Chazarin B, Ai L, Murray CI, Van Eyk JE. Proteomics of the heart. Physiol Rev 2024; 104:931-982. [PMID: 38300522 DOI: 10.1152/physrev.00026.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/25/2023] [Accepted: 01/14/2024] [Indexed: 02/02/2024] Open
Abstract
Mass spectrometry-based proteomics is a sophisticated identification tool specializing in portraying protein dynamics at a molecular level. Proteomics provides biologists with a snapshot of context-dependent protein and proteoform expression, structural conformations, dynamic turnover, and protein-protein interactions. Cardiac proteomics can offer a broader and deeper understanding of the molecular mechanisms that underscore cardiovascular disease, and it is foundational to the development of future therapeutic interventions. This review encapsulates the evolution, current technologies, and future perspectives of proteomic-based mass spectrometry as it applies to the study of the heart. Key technological advancements have allowed researchers to study proteomes at a single-cell level and employ robot-assisted automation systems for enhanced sample preparation techniques, and the increase in fidelity of the mass spectrometers has allowed for the unambiguous identification of numerous dynamic posttranslational modifications. Animal models of cardiovascular disease, ranging from early animal experiments to current sophisticated models of heart failure with preserved ejection fraction, have provided the tools to study a challenging organ in the laboratory. Further technological development will pave the way for the implementation of proteomics even closer within the clinical setting, allowing not only scientists but also patients to benefit from an understanding of protein interplay as it relates to cardiac disease physiology.
Collapse
Affiliation(s)
- Oleg A Karpov
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Aleksandr Stotland
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Koen Raedschelders
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Blandine Chazarin
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Lizhuo Ai
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Christopher I Murray
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| | - Jennifer E Van Eyk
- Smidt Heart Institute, Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States
| |
Collapse
|
2
|
Goldstein Y, Cohen OT, Wald O, Bavli D, Kaplan T, Benny O. Particle uptake in cancer cells can predict malignancy and drug resistance using machine learning. SCIENCE ADVANCES 2024; 10:eadj4370. [PMID: 38809990 DOI: 10.1126/sciadv.adj4370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 04/23/2024] [Indexed: 05/31/2024]
Abstract
Tumor heterogeneity is a primary factor that contributes to treatment failure. Predictive tools, capable of classifying cancer cells based on their functions, may substantially enhance therapy and extend patient life span. The connection between cell biomechanics and cancer cell functions is used here to classify cells through mechanical measurements, via particle uptake. Machine learning (ML) was used to classify cells based on single-cell patterns of uptake of particles with diverse sizes. Three pairs of human cancer cell subpopulations, varied in their level of drug resistance or malignancy, were studied. Cells were allowed to interact with fluorescently labeled polystyrene particles ranging in size from 0.04 to 3.36 μm and analyzed for their uptake patterns using flow cytometry. ML algorithms accurately classified cancer cell subtypes with accuracy rates exceeding 95%. The uptake data were especially advantageous for morphologically similar cell subpopulations. Moreover, the uptake data were found to serve as a form of "normalization" that could reduce variation in repeated experiments.
Collapse
Affiliation(s)
- Yoel Goldstein
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ora T Cohen
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ori Wald
- Department of Cardiothoracic Surgery, Hadassah Medical Center, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Danny Bavli
- Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
- Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ofra Benny
- Institute for Drug Research, The School of Pharmacy, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| |
Collapse
|
3
|
Nolin-Lapalme A, Corbin D, Tastet O, Avram R, Hussin JG. Advancing Fairness in Cardiac Care: Strategies for Mitigating Bias in Artificial Intelligence Models Within Cardiology. Can J Cardiol 2024:S0828-282X(24)00357-X. [PMID: 38735528 DOI: 10.1016/j.cjca.2024.04.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/03/2024] [Accepted: 04/22/2024] [Indexed: 05/14/2024] Open
Abstract
In the dynamic field of medical artificial intelligence (AI), cardiology stands out as a key area for its technological advancements and clinical application. In this review we explore the complex issue of data bias, specifically addressing those encountered during the development and implementation of AI tools in cardiology. We dissect the origins and effects of these biases, which challenge their reliability and widespread applicability in health care. Using a case study, we highlight the complexities involved in addressing these biases from a clinical viewpoint. The goal of this review is to equip researchers and clinicians with the practical knowledge needed to identify, understand, and mitigate these biases, advocating for the creation of AI solutions that are not just technologically sound, but also fair and effective for all patients.
Collapse
Affiliation(s)
- Alexis Nolin-Lapalme
- Department of Medicine, Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada; Mila - Québec AI Institute, Montreal, Quebec, Canada; Heartwise (heartwise.ai), Montreal Heart Institute, Montreal, Quebec, Canada.
| | - Denis Corbin
- Department of Medicine, Montreal Heart Institute, Montreal, Quebec, Canada
| | - Olivier Tastet
- Department of Medicine, Montreal Heart Institute, Montreal, Quebec, Canada
| | - Robert Avram
- Department of Medicine, Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada; Heartwise (heartwise.ai), Montreal Heart Institute, Montreal, Quebec, Canada
| | - Julie G Hussin
- Department of Medicine, Montreal Heart Institute, Montreal, Quebec, Canada; Faculté de Médecine, Université de Montréal, Montreal, Quebec, Canada; Mila - Québec AI Institute, Montreal, Quebec, Canada
| |
Collapse
|
4
|
Mar D, Babenko IM, Zhang R, Noble WS, Denisenko O, Vaisar T, Bomsztyk K. A High-Throughput PIXUL-Matrix-Based Toolbox to Profile Frozen and Formalin-Fixed Paraffin-Embedded Tissues Multiomes. J Transl Med 2024; 104:100282. [PMID: 37924947 PMCID: PMC10872585 DOI: 10.1016/j.labinv.2023.100282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/23/2023] [Accepted: 10/27/2023] [Indexed: 11/06/2023] Open
Abstract
Large-scale high-dimensional multiomics studies are essential to unravel molecular complexity in health and disease. We developed an integrated system for tissue sampling (CryoGrid), analytes preparation (PIXUL), and downstream multiomic analysis in a 96-well plate format (Matrix), MultiomicsTracks96, which we used to interrogate matched frozen and formalin-fixed paraffin-embedded (FFPE) mouse organs. Using this system, we generated 8-dimensional omics data sets encompassing 4 molecular layers of intracellular organization: epigenome (H3K27Ac, H3K4m3, RNA polymerase II, and 5mC levels), transcriptome (messenger RNA levels), epitranscriptome (m6A levels), and proteome (protein levels) in brain, heart, kidney, and liver. There was a high correlation between data from matched frozen and FFPE organs. The Segway genome segmentation algorithm applied to epigenomic profiles confirmed known organ-specific superenhancers in both FFPE and frozen samples. Linear regression analysis showed that proteomic profiles, known to be poorly correlated with transcriptomic data, can be more accurately predicted by the full suite of multiomics data, compared with using epigenomic, transcriptomic, or epitranscriptomic measurements individually.
Collapse
Affiliation(s)
- Daniel Mar
- UW Medicine South Lake Union, University of Washington, Seattle, Washington; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington
| | - Ilona M Babenko
- Diabetes Institute, University of Washington, Seattle, Washington
| | - Ran Zhang
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington
| | - Oleg Denisenko
- UW Medicine South Lake Union, University of Washington, Seattle, Washington
| | - Tomas Vaisar
- Diabetes Institute, University of Washington, Seattle, Washington
| | - Karol Bomsztyk
- UW Medicine South Lake Union, University of Washington, Seattle, Washington; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington; Matchstick Technologies, Inc, Kirkland, Washington.
| |
Collapse
|
5
|
Schactler SA, Scheuerman SJ, Lius A, Altemeier WA, An D, Matula TJ, Mikula M, Kulecka M, Denisenko O, Mar D, Bomsztyk K. CryoGrid-PIXUL-RNA: high throughput RNA isolation platform for tissue transcript analysis. BMC Genomics 2023; 24:446. [PMID: 37553584 PMCID: PMC10408117 DOI: 10.1186/s12864-023-09527-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 07/20/2023] [Indexed: 08/10/2023] Open
Abstract
BACKGROUND Disease molecular complexity requires high throughput workflows to map disease pathways through analysis of vast tissue repositories. Great progress has been made in tissue multiomics analytical technologies. To match the high throughput of these advanced analytical platforms, we have previously developed a multipurpose 96-well microplate sonicator, PIXUL, that can be used in multiple workflows to extract analytes from cultured cells and tissue fragments for various downstream molecular assays. And yet, the sample preparation devices, such as PIXUL, along with the downstream multiomics analytical capabilities have not been fully exploited to interrogate tissues because storing and sampling of such biospecimens remain, in comparison, inefficient. RESULTS To mitigate this tissue interrogation bottleneck, we have developed a low-cost user-friendly system, CryoGrid, to catalog, cryostore and sample tissue fragments. TRIzol is widely used to isolate RNA but it is labor-intensive, hazardous, requires fume-hoods, and is an expensive reagent. Columns are also commonly used to extract RNA but they involve many steps, are prone to human errors, and are also expensive. Both TRIzol and column protocols use test tubes. We developed a microplate PIXUL-based TRIzol-free and column-free RNA isolation protocol that uses a buffer containing proteinase K (PK buffer). We have integrated the CryoGrid system with PIXUL-based PK buffer, TRIzol, and PureLink column methods to isolate RNA for gene-specific qPCR and genome-wide transcript analyses. CryoGrid-PIXUL, when integrated with either PK buffer, TRIzol or PureLink column RNA isolation protocols, yielded similar transcript profiles in frozen organs (brain, heart, kidney and liver) from a mouse model of sepsis. CONCLUSIONS RNA isolation using the CryoGrid-PIXUL system combined with the 96-well microplate PK buffer method offers an inexpensive user-friendly high throughput workflow to study transcriptional responses in tissues in health and disease as well as in therapeutic interventions.
Collapse
Affiliation(s)
- Scott A Schactler
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Stephen J Scheuerman
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Andrea Lius
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, 98109, USA
| | - William A Altemeier
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Center for Lung Biology, University of Washington, Seattle, WA, 98109, USA
| | - Dowon An
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Center for Lung Biology, University of Washington, Seattle, WA, 98109, USA
| | - Thomas J Matula
- Center for Industrial and Medical Ultrasound, Applied Physics Laboratory, University of Washington, Seattle, WA, 98195, USA
- Matchstick Technologies, Inc, Kirkland, WA, 98033, USA
| | - Michal Mikula
- Department of Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781, Warsaw, Poland
| | - Maria Kulecka
- Department of Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781, Warsaw, Poland
- Department of Gastroenterology, Hepatology and Clinical Oncology, Centre for Postgraduate Medical Education, 01-813, Warsaw, Poland
| | - Oleg Denisenko
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
| | - Daniel Mar
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Karol Bomsztyk
- UW Medicine South Lake Union, University of Washington, Seattle, WA, 98109, USA.
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, 98109, USA.
- Matchstick Technologies, Inc, Kirkland, WA, 98033, USA.
| |
Collapse
|
6
|
Messner CB, Demichev V, Wang Z, Hartl J, Kustatscher G, Mülleder M, Ralser M. Mass spectrometry-based high-throughput proteomics and its role in biomedical studies and systems biology. Proteomics 2023; 23:e2200013. [PMID: 36349817 DOI: 10.1002/pmic.202200013] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 10/13/2022] [Accepted: 10/13/2022] [Indexed: 11/11/2022]
Abstract
There are multiple reasons why the next generation of biological and medical studies require increasing numbers of samples. Biological systems are dynamic, and the effect of a perturbation depends on the genetic background and environment. As a consequence, many conditions need to be considered to reach generalizable conclusions. Moreover, human population and clinical studies only reach sufficient statistical power if conducted at scale and with precise measurement methods. Finally, many proteins remain without sufficient functional annotations, because they have not been systematically studied under a broad range of conditions. In this review, we discuss the latest technical developments in mass spectrometry (MS)-based proteomics that facilitate large-scale studies by fast and efficient chromatography, fast scanning mass spectrometers, data-independent acquisition (DIA), and new software. We further highlight recent studies which demonstrate how high-throughput (HT) proteomics can be applied to capture biological diversity, to annotate gene functions or to generate predictive and prognostic models for human diseases.
Collapse
Affiliation(s)
- Christoph B Messner
- Precision Proteomics Center, Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, Davos, Switzerland
| | - Vadim Demichev
- Institute of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Ziyue Wang
- Institute of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Johannes Hartl
- Institute of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Georg Kustatscher
- Wellcome Centre for Cell Biology, University of Edinburgh, Max Born Crescent, Edinburgh, Scotland, UK
| | - Michael Mülleder
- Core Facility High Throughput Mass Spectrometry, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Markus Ralser
- Institute of Biochemistry, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
7
|
Mar D, Babenko IM, Zhang R, Noble WS, Denisenko O, Vaisar T, Bomsztyk K. MultiomicsTracks96: A high throughput PIXUL-Matrix-based toolbox to profile frozen and FFPE tissues multiomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.533031. [PMID: 36993219 PMCID: PMC10055122 DOI: 10.1101/2023.03.16.533031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Background The multiome is an integrated assembly of distinct classes of molecules and molecular properties, or "omes," measured in the same biospecimen. Freezing and formalin-fixed paraffin-embedding (FFPE) are two common ways to store tissues, and these practices have generated vast biospecimen repositories. However, these biospecimens have been underutilized for multi-omic analysis due to the low throughput of current analytical technologies that impede large-scale studies. Methods Tissue sampling, preparation, and downstream analysis were integrated into a 96-well format multi-omics workflow, MultiomicsTracks96. Frozen mouse organs were sampled using the CryoGrid system, and matched FFPE samples were processed using a microtome. The 96-well format sonicator, PIXUL, was adapted to extract DNA, RNA, chromatin, and protein from tissues. The 96-well format analytical platform, Matrix, was used for chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), methylated RNA immunoprecipitation (MeRIP), and RNA reverse transcription (RT) assays followed by qPCR and sequencing. LC-MS/MS was used for protein analysis. The Segway genome segmentation algorithm was used to identify functional genomic regions, and linear regressors based on the multi-omics data were trained to predict protein expression. Results MultiomicsTracks96 was used to generate 8-dimensional datasets including RNA-seq measurements of mRNA expression; MeRIP-seq measurements of m6A and m5C; ChIP-seq measurements of H3K27Ac, H3K4m3, and Pol II; MeDIP-seq measurements of 5mC; and LC-MS/MS measurements of proteins. We observed high correlation between data from matched frozen and FFPE organs. The Segway genome segmentation algorithm applied to epigenomic profiles (ChIP-seq: H3K27Ac, H3K4m3, Pol II; MeDIP-seq: 5mC) was able to recapitulate and predict organ-specific super-enhancers in both FFPE and frozen samples. Linear regression analysis showed that proteomic expression profiles can be more accurately predicted by the full suite of multi-omics data, compared to using epigenomic, transcriptomic, or epitranscriptomic measurements individually. Conclusions The MultiomicsTracks96 workflow is well suited for high dimensional multi-omics studies - for instance, multiorgan animal models of disease, drug toxicities, environmental exposure, and aging as well as large-scale clinical investigations involving the use of biospecimens from existing tissue repositories.
Collapse
|
8
|
Biełło KA, Lucena C, López-Tenllado FJ, Hidalgo-Carrillo J, Rodríguez-Caballero G, Cabello P, Sáez LP, Luque-Almagro V, Roldán MD, Moreno-Vivián C, Olaya-Abril A. Holistic view of biological nitrogen fixation and phosphorus mobilization in Azotobacter chroococcum NCIMB 8003. Front Microbiol 2023; 14:1129721. [PMID: 36846808 PMCID: PMC9945222 DOI: 10.3389/fmicb.2023.1129721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Nitrogen (N) and phosphorus (P) deficiencies are two of the most agronomic problems that cause significant decrease in crop yield and quality. N and P chemical fertilizers are widely used in current agriculture, causing environmental problems and increasing production costs. Therefore, the development of alternative strategies to reduce the use of chemical fertilizers while maintaining N and P inputs are being investigated. Although dinitrogen is an abundant gas in the atmosphere, it requires biological nitrogen fixation (BNF) to be transformed into ammonium, a nitrogen source assimilable by living organisms. This process is bioenergetically expensive and, therefore, highly regulated. Factors like availability of other essential elements, as phosphorus, strongly influence BNF. However, the molecular mechanisms of these interactions are unclear. In this work, a physiological characterization of BNF and phosphorus mobilization (PM) from an insoluble form (Ca3(PO4)2) in Azotobacter chroococcum NCIMB 8003 was carried out. These processes were analyzed by quantitative proteomics in order to detect their molecular requirements and interactions. BNF led to a metabolic change beyond the proteins strictly necessary to carry out the process, including the metabolism related to other elements, like phosphorus. Also, changes in cell mobility, heme group synthesis and oxidative stress responses were observed. This study also revealed two phosphatases that seem to have the main role in PM, an exopolyphosphatase and a non-specific alkaline phosphatase PhoX. When both BNF and PM processes take place simultaneously, the synthesis of nitrogenous bases and L-methionine were also affected. Thus, although the interdependence is still unknown, possible biotechnological applications of these processes should take into account the indicated factors.
Collapse
Affiliation(s)
- Karolina A. Biełło
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Carlos Lucena
- Departamento de Botánica, Ecología y Fisiología Vegetal, Edificio Celestino Mutis, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Francisco J. López-Tenllado
- Departamento de Química Orgánica, Instituto Universitario de Investigación en Química Fina y Nanoquímica (IUNAN), Universidad de Córdoba, Córdoba, Spain
| | - Jesús Hidalgo-Carrillo
- Departamento de Química Orgánica, Instituto Universitario de Investigación en Química Fina y Nanoquímica (IUNAN), Universidad de Córdoba, Córdoba, Spain
| | - Gema Rodríguez-Caballero
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Purificación Cabello
- Departamento de Botánica, Ecología y Fisiología Vegetal, Edificio Celestino Mutis, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Lara P. Sáez
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Víctor Luque-Almagro
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - María Dolores Roldán
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Conrado Moreno-Vivián
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain
| | - Alfonso Olaya-Abril
- Departamento de Bioquímica y Biología Molecular, Edificio Severo Ochoa, Campus de Rabanales, Universidad de Córdoba, Córdoba, Spain,*Correspondence: Alfonso Olaya-Abril,
| |
Collapse
|
9
|
Abstract
Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.
Collapse
|
10
|
Luo H, Xiang Y, Fang X, Lin W, Wang F, Wu H, Wang H. BatchDTA: implicit batch alignment enhances deep learning-based drug-target affinity estimation. Brief Bioinform 2022; 23:6632927. [PMID: 35794723 DOI: 10.1093/bib/bbac260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/23/2022] [Accepted: 06/03/2022] [Indexed: 11/14/2022] Open
Abstract
Candidate compounds with high binding affinities toward a target protein are likely to be developed as drugs. Deep neural networks (DNNs) have attracted increasing attention for drug-target affinity (DTA) estimation owning to their efficiency. However, the negative impact of batch effects caused by measure metrics, system technologies and other assay information is seldom discussed when training a DNN model for DTA. Suffering from the data deviation caused by batch effects, the DNN models can only be trained on a small amount of 'clean' data. Thus, it is challenging for them to provide precise and consistent estimations. We design a batch-sensitive training framework, namely BatchDTA, to train the DNN models. BatchDTA implicitly aligns multiple batches toward the same protein through learning the orders of candidate compounds with respect to the batches, alleviating the impact of the batch effects on the DNN models. Extensive experiments demonstrate that BatchDTA facilitates four mainstream DNN models to enhance the ability and robustness on multiple DTA datasets (BindingDB, Davis and KIBA). The average concordance index of the DNN models achieves a relative improvement of 4.0%. The case study reveals that BatchDTA can successfully learn the ranking orders of the compounds from multiple batches. In addition, BatchDTA can also be applied to the fused data collected from multiple sources to achieve further improvement.
Collapse
Affiliation(s)
- Hongyu Luo
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Yingfei Xiang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Xiaomin Fang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Wei Lin
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Fan Wang
- PaddleHelix team, Baidu Inc., 518000, Shenzhen, China
| | - Hua Wu
- Baidu Inc., 100000, Beijing, China
| | | |
Collapse
|
11
|
Han W, Li L. Evaluating and minimizing batch effects in metabolomics. MASS SPECTROMETRY REVIEWS 2022; 41:421-442. [PMID: 33238061 DOI: 10.1002/mas.21672] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 10/27/2020] [Accepted: 10/29/2020] [Indexed: 06/11/2023]
Abstract
Determining metabolomic differences among samples of different phenotypes is a critical component of metabolomics research. With the rapid advances in analytical tools such as ultrahigh-resolution chromatography and mass spectrometry, an increasing number of metabolites can now be profiled with high quantification accuracy. The increased detectability and accuracy raise the level of stringiness required to reduce or control any experimental artifacts that can interfere with the measurement of phenotype-related metabolome changes. One of the artifacts is the batch effect that can be caused by multiple sources. In this review, we discuss the origins of batch effects, approaches to detect interbatch variations, and methods to correct unwanted data variability due to batch effects. We recognize that minimizing batch effects is currently an active research area, yet a very challenging task from both experimental and data processing perspectives. Thus, we try to be critical in describing the performance of a reported method with the hope of stimulating further studies for improving existing methods or developing new methods.
Collapse
Affiliation(s)
- Wei Han
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| | - Liang Li
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
12
|
Zhang X, Ye Z, Chen J, Qiao F. AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data. Brief Bioinform 2021; 23:6485011. [PMID: 34958674 DOI: 10.1093/bib/bbab528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/16/2021] [Accepted: 11/14/2021] [Indexed: 11/14/2022] Open
Abstract
Batch effects explain a large part of the noise when merging gene expression data. Removing irrelevant variations introduced by batch effects plays an important role in gene expression studies. To obtain reliable differential analysis results, it is necessary to remove the variation caused by technical conditions between different batches while preserving biological variation. Usually, merging data directly with batch effects leads to a sharp rise in false positives. Although some methods of batch correction have been developed, they have some drawbacks. In this study, we develop a new algorithm, adjustment mean distribution-based normalization (AMDBNorm), which is based on a probability distribution to correct batch effects while preserving biological variation. AMDBNorm solves the defects of the existing batch correction methods. We compared several popular methods of batch correction with AMDBNorm using two real gene expression datasets with batch effects and analyzed the results of batch correction from the visual and quantitative perspectives. To ensure the biological variation was well protected, the effects of the batch correction methods were verified by hierarchical cluster analysis. The results showed that the AMDBNorm algorithm could remove batch effects of gene expression data effectively and retain more biological variation than other methods. Our approach provides the researchers with reliable data support in the study of differential gene expression analysis and prognostic biomarker selection.
Collapse
Affiliation(s)
- Xu Zhang
- School of Mathematics and Statistics, Southwest University, China
| | | | - Jing Chen
- School of Science, Southwest University of Science and Technology, China
| | | |
Collapse
|
13
|
Lu C, Glisovic-Aplenc T, Bernt KM, Nestler K, Cesare J, Cao L, Lee H, Fazelinia H, Chinwalla A, Xu Y, Shestova O, Xing Y, Gill S, Li M, Garcia B, Aplenc R. Longitudinal Large-Scale Semiquantitative Proteomic Data Stability Across Multiple Instrument Platforms. J Proteome Res 2021; 20:5203-5211. [PMID: 34669412 DOI: 10.1021/acs.jproteome.1c00624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
With the rapid developments in mass spectrometry (MS)-based proteomics methods, label-free semiquantitative proteomics has become an increasingly popular tool for profiling global protein abundances in an unbiased manner. However, the reproducibility of these data across time and LC-MS platforms is not well characterized. Here, we evaluate the performance of three LC-MS platforms (Orbitrap Elite, Q Exactive HF, and Orbitrap Fusion) in label-free semiquantitative analysis of cell surface proteins over a six-year period. Sucrose gradient ultracentrifugation was used for surfaceome enrichment, following gel separation for in-depth protein identification. With our established workflow, we consistently detected and reproducibly quantified >2300 putative cell surface proteins in a human acute myeloid leukemia (AML) cell line on all three platforms. To our knowledge this is the first study reporting highly reproducible semiquantitative proteomic data collection of biological replicates across multiple years and LC-MS platforms. These data provide experimental justification for semiquantitative proteomic study designs that are executed over multiyear time intervals and on different platforms. Multiyear and multiplatform experimental designs will likely enable larger scale proteomic studies and facilitate longitudinal proteomic studies by investigators lacking access to high throughput MS facilities. Data are available via ProteomeXchange with identifier PXD022721.
Collapse
Affiliation(s)
- Congcong Lu
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Tina Glisovic-Aplenc
- Division of Oncology, Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Kathrin M Bernt
- Division of Oncology, Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Kevin Nestler
- Division of Oncology, Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Joseph Cesare
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Lusha Cao
- Division of Oncology, Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States
| | - Hyoungjoo Lee
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Hossein Fazelinia
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Proteomics Core Facility, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States
| | - Asif Chinwalla
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States
| | - Yang Xu
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Olga Shestova
- Center for Cellular Immunotherapies, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Yi Xing
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Saar Gill
- Center for Cellular Immunotherapies, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Benjamin Garcia
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| | - Richard Aplenc
- Division of Oncology, Center for Childhood Cancer Research, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, United States.,Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
14
|
Jungwirth E, Panzitt K, Marschall HU, Thallinger GG, Wagner M. Meta-analysis and Consolidation of Farnesoid X Receptor Chromatin Immunoprecipitation Sequencing Data Across Different Species and Conditions. Hepatol Commun 2021; 5:1721-1736. [PMID: 34558825 PMCID: PMC8485886 DOI: 10.1002/hep4.1749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 04/25/2021] [Indexed: 12/24/2022] Open
Abstract
Farnesoid X receptor (FXR) is a nuclear receptor that controls gene regulation of different metabolic pathways and represents an upcoming drug target for various liver diseases. Several data sets on genome-wide FXR binding in different species and conditions exist. We have previously reported that these data sets are heterogeneous and do not cover the full spectrum of potential FXR binding sites. Here, we report the first meta-analysis of all publicly available FXR chromatin immunoprecipitation sequencing (ChIP-seq) data sets from mouse, rat, and human across different conditions using a newly generated analysis pipeline. All publicly available single data sets were biocurated in a standardized manner and compared on every relevant level from raw reads to affected functional pathways. Individual murine data sets were then virtually merged into a single unique "FXR binding atlas" spanning all potential binding sites across various conditions. Comparison of the single biocurated data sets showed that the overlap of FXR binding sites between different species is modest and ranges from 48% (mouse-human) to 55% (mouse-rat). Moreover, in vivo data among different species are more similar than human in vivo data compared to human in vitro data. The consolidated murine global FXR binding atlas virtually increases sequencing depth and allows recovering more and novel potential binding sites and signaling pathways that were missed in the individual data sets. The FXR binding atlas is publicly searchable (https://fxratlas.tugraz.at). Conclusion: Published single FXR ChIP-seq data sets and large-scale integrated omics data sets do not cover the full spectrum of FXR binding. Combining different individual data sets and creating an "FXR super-binding atlas" enhances understanding of FXR signaling capacities across different conditions. This is important when considering the potential wide spectrum for drugs targeting FXR in liver diseases.
Collapse
Affiliation(s)
- Emilian Jungwirth
- Research Unit for Translational Nuclear Receptor ResearchDivision of Gastroenterology and HepatologyMedical University GrazGrazAustria.,Institute of Biomedical InformaticsGraz University of TechnologyGrazAustria.,OMICS Center GrazGrazAustria.,BioTechMed-GrazGrazAustria
| | - Katrin Panzitt
- Research Unit for Translational Nuclear Receptor ResearchDivision of Gastroenterology and HepatologyMedical University GrazGrazAustria
| | - Hanns-Ulrich Marschall
- Department of Molecular and Clinical Medicine/Wallenberg LaboratorySahlgrenska AcademyUniversity of GothenburgGothenburgSweden
| | - Gerhard G Thallinger
- Institute of Biomedical InformaticsGraz University of TechnologyGrazAustria.,OMICS Center GrazGrazAustria.,BioTechMed-GrazGrazAustria
| | - Martin Wagner
- Research Unit for Translational Nuclear Receptor ResearchDivision of Gastroenterology and HepatologyMedical University GrazGrazAustria.,OMICS Center GrazGrazAustria.,BioTechMed-GrazGrazAustria
| |
Collapse
|
15
|
Torbati ME, Tudorascu DL, Minhas DS, Maillard P, DeCarli CS, Hwang SJ. Multi-scanner Harmonization of Paired Neuroimaging Data via Structure Preserving Embedding Learning. ... IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION 2021; 2021:3277-3286. [PMID: 34909551 PMCID: PMC8668020 DOI: 10.1109/iccvw54120.2021.00367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Combining datasets from multiple sites/scanners has been becoming increasingly more prevalent in modern neuroimaging studies. Despite numerous benefits from the growth in sample size, substantial technical variability associated with site/scanner-related effects exists which may inadvertently bias subsequent downstream analyses. Such a challenge calls for a data harmonization procedure which reduces the scanner effects and allows the scans to be combined for pooled analyses. In this work, we present MISPEL (Multi-scanner Image harmonization via Structure Preserving Embedding Learning), a multi-scanner harmonization framework. Unlike existing techniques, MISPEL does not assume a perfect coregistration across the scans, and the framework is naturally extendable to more than two scanners. Importantly, we incorporate our multi-scanner dataset where each subject is scanned on four different scanners. This unique paired dataset allows us to define and aim for an ideal harmonization (e.g., each subject with identical brain tissue volumes on all scanners). We extensively view scanner effects under varying metrics and demonstrate how MISPEL significantly improves them.
Collapse
|
16
|
Čuklina J, Lee CH, Williams EG, Sajic T, Collins BC, Rodríguez Martínez M, Sharma VS, Wendt F, Goetze S, Keele GR, Wollscheid B, Aebersold R, Pedrioli PGA. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol 2021; 17:e10240. [PMID: 34432947 PMCID: PMC8447595 DOI: 10.15252/msb.202110240] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/16/2021] [Accepted: 07/26/2021] [Indexed: 12/11/2022] Open
Abstract
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
Collapse
Affiliation(s)
- Jelena Čuklina
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- PhD Program in Systems BiologyUniversity of Zurich and ETH ZurichZurichSwitzerland
- IBM Research EuropeRüschlikonSwitzerland
| | - Chloe H Lee
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Evan G Williams
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgLuxembourgLuxembourg
| | - Tatjana Sajic
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Ben C Collins
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Queen’s University BelfastBelfastUK
| | | | - Varun S Sharma
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
| | - Fabian Wendt
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
| | - Sandra Goetze
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | | | - Bernd Wollscheid
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Ruedi Aebersold
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Faculty of ScienceUniversity of ZurichZurichSwitzerland
| | - Patrick G A Pedrioli
- Department of BiologyInstitute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Department of Health Sciences and TechnologyInstitute of Translational MedicineETH ZurichZurichSwitzerland
- ETH ZürichPHRT‐CPACZürichSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
17
|
Shayesteh S, Nazari M, Salahshour A, Sandoughdaran S, Hajianfar G, Khateri M, Yaghobi Joybari A, Jozian F, Fatehi Feyzabad SH, Arabi H, Shiri I, Zaidi H. Treatment response prediction using MRI-based pre-, post-, and delta-radiomic features and machine learning algorithms in colorectal cancer. Med Phys 2021; 48:3691-3701. [PMID: 33894058 DOI: 10.1002/mp.14896] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 03/07/2021] [Accepted: 04/06/2021] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES We evaluate the feasibility of treatment response prediction using MRI-based pre-, post-, and delta-radiomic features for locally advanced rectal cancer (LARC) patients treated by neoadjuvant chemoradiation therapy (nCRT). MATERIALS AND METHODS This retrospective study included 53 LARC patients divided into a training set (Center#1, n = 36) and external validation set (Center#2, n = 17). T2-weighted (T2W) MRI was acquired for all patients, 2 weeks before and 4 weeks after nCRT. Ninety-six radiomic features, including intensity, morphological and second- and high-order texture features were extracted from segmented 3D volumes from T2W MRI. All features were harmonized using ComBat algorithm. Max-Relevance-Min-Redundancy (MRMR) algorithm was used as feature selector and k-nearest neighbors (KNN), Naïve Bayes (NB), Random forests (RF), and eXtreme Gradient Boosting (XGB) algorithms were used as classifiers. The evaluation was performed using the area under the receiver operator characteristic (ROC) curve (AUC), sensitivity, specificity and accuracy. RESULTS In univariate analysis, the highest AUC in pre-, post-, and delta-radiomic features were 0.78, 0.70, and 0.71, for GLCM_IMC1, shape (surface area and volume) and GLSZM_GLNU features, respectively. In multivariate analysis, RF and KNN achieved the highest AUC (0.85 ± 0.04 and 0.81 ± 0.14, respectively) among pre- and post-treatment features. The highest AUC was achieved for the delta-radiomic-based RF model (0.96 ± 0.01) followed by NB (0.96 ± 0.04). Overall. Delta-radiomics model, outperformed both pre- and post-treatment features (P-value <0.05). CONCLUSION Multivariate analysis of delta-radiomic T2W MRI features using machine learning algorithms could potentially be used for response prediction in LARC patients undergoing nCRT. We also observed that multivariate analysis of delta-radiomic features using RF classifiers can be used as powerful biomarkers for response prediction in LARC.
Collapse
Affiliation(s)
- Sajad Shayesteh
- Department of Physiology, Pharmacology and Medical Physics, Alborz University of Medical Sciences, Karaj, Iran
| | - Mostafa Nazari
- Department of Biomedical Engineering and Medical Physics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Salahshour
- Department of Radiology, Alborz University of Medical Sciences, Karaj, Iran
| | - Saleh Sandoughdaran
- Department of Radiation Oncology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ghasem Hajianfar
- Rajaie Cardiovascular, Medical & Research Centre, Iran University of Medical Science, Tehran, Iran
| | - Maziar Khateri
- Department of Medical Radiation Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Ali Yaghobi Joybari
- Department of Radiation Oncology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fariba Jozian
- Department of Radiation Oncology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Hossein Arabi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Isaac Shiri
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland
| | - Habib Zaidi
- Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland.,Geneva University Neurocenter, Geneva University, Geneva, Switzerland.,Department of Nuclear Medicine and Molecular Imaging, University of Groningen, University Medical Center Groningen, Groningen, Netherlands.,Department of Nuclear Medicine, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
18
|
Brombacher E, Schad A, Kreutz C. Tail-Robust Quantile Normalization. Proteomics 2020; 20:e2000068. [PMID: 32865322 DOI: 10.1002/pmic.202000068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/25/2020] [Indexed: 11/07/2022]
Abstract
High-throughput biological data-such as mass spectrometry (MS)-based proteomics data-suffer from systematic non-biological variance due to systematic errors. This hinders the estimation of "real" biological signals and, in turn, decreases the power of statistical tests and biases the identification of differentially expressed proteins. To remove such unintended variation, while retaining the biological signal of interest, analysis workflows for quantitative MS data typically comprise normalization prior to their statistical analysis. Several normalization methods, such as quantile normalization (QN), have originally been developed for microarray data. In contrast to microarray data proteomics data may contain features, in the form of protein intensities that are consistently high across experimental conditions and, hence, are encountered in the tails of the protein intensity distribution. If QN is applied in the presence of such proteins statistical inferences of the features' intensity profiles are impeded due to the biased estimation of their variance. A freely available, novel approach is introduced which serves as an improvement of the classical QN by preserving the biological signals of features in the tails of the intensity distribution and by accounting for sample-dependent missing values (MVs): The "tail-robust quantile normalization" (TRQN).
Collapse
Affiliation(s)
- Eva Brombacher
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79104, Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, 79104, Freiburg, Germany.,Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, 79104, Freiburg, Germany.,German Cancer Consortium (DKTK), 79106, Freiburg, Germany.,German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Ariane Schad
- Center for Biosystems Analysis (ZBSA), University of Freiburg, 79104, Freiburg, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79104, Freiburg, Germany.,Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, 79104, Freiburg, Germany
| |
Collapse
|
19
|
Gouveia D, Grenga L, Pible O, Armengaud J. Quick microbial molecular phenotyping by differential shotgun proteomics. Environ Microbiol 2020; 22:2996-3004. [PMID: 32133743 PMCID: PMC7496289 DOI: 10.1111/1462-2920.14975] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 02/29/2020] [Accepted: 03/02/2020] [Indexed: 12/12/2022]
Abstract
Differential shotgun proteomics identifies proteins that discriminate between sets of samples based on differences in abundance. This methodology can be easily applied to study (i) specific microorganisms subjected to a variety of growth or stress conditions or (ii) different microorganisms sampled in the same condition. In microbiology, this comparison is particularly successful because differing microorganism phenotypes are explained by clearly altered abundances of key protein players. The extensive description and quantification of proteins from any given microorganism can be routinely obtained for several conditions within a few days by tandem mass spectrometry. Such protein-centred microbial molecular phenotyping is rich in information. However, well-designed experimental strategies, carefully parameterized analytical pipelines, and sound statistical approaches must be applied if the shotgun proteomic data are to be correctly interpreted. This minireview describes these key items for a quick molecular phenotyping based on label-free quantification shotgun proteomics.
Collapse
Affiliation(s)
- Duarte Gouveia
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D)Service de Pharmacologie et Immunoanalyse (SPI)CEA, INRAE, F‐30207 Bagnols‐sur‐CèzeFrance
| | - Lucia Grenga
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D)Service de Pharmacologie et Immunoanalyse (SPI)CEA, INRAE, F‐30207 Bagnols‐sur‐CèzeFrance
| | - Olivier Pible
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D)Service de Pharmacologie et Immunoanalyse (SPI)CEA, INRAE, F‐30207 Bagnols‐sur‐CèzeFrance
| | - Jean Armengaud
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D)Service de Pharmacologie et Immunoanalyse (SPI)CEA, INRAE, F‐30207 Bagnols‐sur‐CèzeFrance
| |
Collapse
|
20
|
Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics data analyses. J Hum Genet 2020; 66:93-102. [PMID: 32385339 PMCID: PMC7728595 DOI: 10.1038/s10038-020-0763-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/25/2020] [Accepted: 03/28/2020] [Indexed: 11/22/2022]
Abstract
Omics studies attempt to extract meaningful messages from large-scale and high-dimensional data sets by treating the data sets as a whole. The concept of treating data sets as a whole is important in every step of the data-handling procedures: the pre-processing step of data records, the step of statistical analyses and machine learning, translation of the outputs into human natural perceptions, and acceptance of the messages with uncertainty. In the pre-processing, the method by which to control the data quality and batch effects are discussed. For the main analyses, the approaches are divided into two types and their basic concepts are discussed. The first type is the evaluation of many items individually, followed by interpretation of individual items in the context of multiple testing and combination. The second type is the extraction of fewer important aspects from the whole data records. The outputs of the main analyses are translated into natural languages with techniques, such as annotation and ontology. The other technique for making the outputs perceptible is visualization. At the end of this review, one of the most important issues in the interpretation of omics data analyses is discussed. Omics studies have a large amount of information in their data sets, and every approach reveals only a very restricted aspect of the whole data sets. The understandable messages from these studies have unavoidable uncertainty.
Collapse
Affiliation(s)
- Ryo Yamada
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan.
| | - Daigo Okada
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Juan Wang
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Tapati Basak
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Satoshi Koyama
- Unit of Statistical Genetics, Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Nanbusogo-Kenkyu-To-1, 5F, 53 Syogoin-Kawaramachi, Sakyo-ku, Kyoto, 606-8507, Japan
| |
Collapse
|
21
|
Hunter P. The "industrial" revolution in biomedical research: Data explosion and reproducibility crisis drive changes in lab workflows. EMBO Rep 2020; 21:e50003. [PMID: 31984601 DOI: 10.15252/embr.202050003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The ever increasing amount of data, new technologies and the need for reproducible results drive profound changes in the workflows of diagnostic and basic research laboratories.
Collapse
|