1
|
Kwak SY, Park JH, Won HY, Jang H, Lee SB, Jang WI, Park S, Kim MJ, Shim S. CXCL10 upregulation in radiation-exposed human peripheral blood mononuclear cells as a candidate biomarker for rapid triage after radiation exposure. Int J Radiat Biol 2024; 100:541-549. [PMID: 38227479 DOI: 10.1080/09553002.2023.2295300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 11/13/2023] [Indexed: 01/17/2024]
Abstract
PURPOSE In case of a nuclear accident, individuals with high-dose radiation exposure (>1-2 Gy) should be rapidly identified. While ferredoxin reductase (FDXR) was recently suggested as a radiation-responsive gene, the use of a single gene biomarker limits radiation dose assessment. To overcome this limitation, we sought to identify reliable radiation-responsive gene biomarkers. MATERIALS AND METHODS Peripheral blood mononuclear cells (PBMCs) were isolated from mice after total body irradiation, and gene expression was analyzed using a microarray approach to identify radiation-responsive genes. RESULTS In light of the essential role of the immune response following radiation exposure, we selected several immune-related candidate genes upregulated by radiation exposure in both mouse and human PBMCs. In particular, the expression of ACOD1 and CXCL10 increased in a radiation dose-dependent manner, while remaining unchanged following lipopolysaccharide (LPS) stimulation in human PBMCs. The expression of both genes was further evaluated in the blood of cancer patients before and after radiotherapy. CXCL10 expression exhibited a distinct increase after radiotherapy and was positively correlated with FDXR expression. CONCLUSIONS CXCL10 expression in irradiated PBMCs represents a potential biomarker for radiation exposure.
Collapse
Affiliation(s)
- Seo Young Kwak
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
| | - Ji-Hye Park
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
- OPTOLANE Technologies Inc., Seongnam, South Korea
| | | | - Hyosun Jang
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
| | - Seung Bum Lee
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
| | - Won Il Jang
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
- Department of Radiation Oncology, Korea Cancer Center Hospital, Korea Institute of Radiological and Medical Sciences, Seoul, South Korea
| | - Sunhoo Park
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
- Department of Pathology, Korea Cancer Center Hospital, Korea Institute of Radiological & Medical Science, Seoul, South Korea
| | - Min-Jung Kim
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
| | - Sehwan Shim
- Korea Institute of Radiological & Medical Science, Laboratory of Radiation Exposure & Therapeutics, National Radiation Emergency Medical Center, Seoul, South Korea
| |
Collapse
|
2
|
Tian L, Yu T. An integrated deep learning framework for the interpretation of untargeted metabolomics data. Brief Bioinform 2023; 24:bbad244. [PMID: 37369636 DOI: 10.1093/bib/bbad244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 06/02/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Untargeted metabolomics is gaining widespread applications. The key aspects of the data analysis include modeling complex activities of the metabolic network, selecting metabolites associated with clinical outcome and finding critical metabolic pathways to reveal biological mechanisms. One of the key roadblocks in data analysis is not well-addressed, which is the problem of matching uncertainty between data features and known metabolites. Given the limitations of the experimental technology, the identities of data features cannot be directly revealed in the data. The predominant approach for mapping features to metabolites is to match the mass-to-charge ratio (m/z) of data features to those derived from theoretical values of known metabolites. The relationship between features and metabolites is not one-to-one since some metabolites share molecular composition, and various adduct ions can be derived from the same metabolite. This matching uncertainty causes unreliable metabolite selection and functional analysis results. Here we introduce an integrated deep learning framework for metabolomics data that take matching uncertainty into consideration. The model is devised with a gradual sparsification neural network based on the known metabolic network and the annotation relationship between features and metabolites. This architecture characterizes metabolomics data and reflects the modular structure of biological system. Three goals can be achieved simultaneously without requiring much complex inference and additional assumptions: (1) evaluate metabolite importance, (2) infer feature-metabolite matching likelihood and (3) select disease sub-networks. When applied to a COVID metabolomics dataset and an aging mouse brain dataset, our method found metabolic sub-networks that were easily interpretable.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Guangdong, China
| |
Collapse
|
3
|
Mallik S, Sarkar A, Nath S, Maulik U, Das S, Pati SK, Ghosh S, Zhao Z. 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection. Front Genet 2023; 14:1095330. [PMID: 36865387 PMCID: PMC9971618 DOI: 10.3389/fgene.2023.1095330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/30/2023] [Indexed: 02/16/2023] Open
Abstract
In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
Collapse
Affiliation(s)
- Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| | - Anasua Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Sagnik Nath
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Supantha Das
- Department of Information Technology, Academy of Technology, Hooghly, West Bengal, India
| | - Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University, Kolkata, West Bengal, India
| | - Soumadip Ghosh
- Department of Computer Science & Engineering, Sister Nivedita University, New Town, West Bengal, India
| | - Zhongming Zhao
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| |
Collapse
|
4
|
Cao H, Hong X, Tost H, Meyer-Lindenberg A, Schwarz E. Advancing translational research in neuroscience through multi-task learning. Front Psychiatry 2022; 13:993289. [PMID: 36465289 PMCID: PMC9714033 DOI: 10.3389/fpsyt.2022.993289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 10/24/2022] [Indexed: 11/18/2022] Open
Abstract
Translational research in neuroscience is increasingly focusing on the analysis of multi-modal data, in order to account for the biological complexity of suspected disease mechanisms. Recent advances in machine learning have the potential to substantially advance such translational research through the simultaneous analysis of different data modalities. This review focuses on one of such approaches, the so-called "multi-task learning" (MTL), and describes its potential utility for multi-modal data analyses in neuroscience. We summarize the methodological development of MTL starting from conventional machine learning, and present several scenarios that appear particularly suitable for its application. For these scenarios, we highlight different types of MTL algorithms, discuss emerging technological adaptations, and provide a step-by-step guide for readers to apply the MTL approach in their own studies. With its ability to simultaneously analyze multiple data modalities, MTL may become an important element of the analytics repertoire used in future neuroscience research and beyond.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Xudong Hong
- Department of Computer Vision and Machine Learning, Max Planck Institute for Informatics, Saarbrücken, Germany
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Heike Tost
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Andreas Meyer-Lindenberg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
5
|
Murphy RG, Gilmore A, Senevirathne S, O'Reilly PG, LaBonte Wilson M, Jain S, McArt DG. Particle Swarm Optimization Artificial Intelligence technique for gene signature discovery in transcriptomic cohorts. Comput Struct Biotechnol J 2022; 20:5547-5563. [PMID: 36249564 PMCID: PMC9556859 DOI: 10.1016/j.csbj.2022.09.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 09/22/2022] [Accepted: 09/22/2022] [Indexed: 11/12/2022] Open
Abstract
EBPSO identifies unique, accurate, and succinct gene signatures. Key genes within the signatures provide biological insights its associated functions. A web-based micro-framework developed for ease of use and real-time visualizations. A promising alternative to traditional single gene signature generation. Downstream analysis will better translate these signatures towards clinical translation.
The development of gene signatures is key for delivering personalized medicine, despite only a few signatures being available for use in the clinic for cancer patients. Gene signature discovery tends to revolve around identifying a single signature. However, it has been shown that various highly predictive signatures can be produced from the same dataset. This study assumes that the presentation of top ranked signatures will allow greater efforts in the selection of gene signatures for validation on external datasets and for their clinical translation. Particle swarm optimization (PSO) is an evolutionary algorithm often used as a search strategy and largely represented as binary PSO (BPSO) in this domain. BPSO, however, fails to produce succinct feature sets for complex optimization problems, thus affecting its overall runtime and optimization performance. Enhanced BPSO (EBPSO) was developed to overcome these shortcomings. Thus, this study will validate unique candidate gene signatures for different underlying biology from EBPSO on transcriptomics cohorts. EBPSO was consistently seen to be as accurate as BPSO with substantially smaller feature signatures and significantly faster runtimes. 100% accuracy was achieved in all but two of the selected data sets. Using clinical transcriptomics cohorts, EBPSO has demonstrated the ability to identify accurate, succinct, and significantly prognostic signatures that are unique from one another. This has been proposed as a promising alternative to overcome the issues regarding traditional single gene signature generation. Interpretation of key genes within the signatures provided biological insights into the associated functions that were well correlated to their cancer type.
Collapse
|
6
|
Jin Z, Kang J, Yu T. Feature selection and classification over the network with missing node observations. Stat Med 2022; 41:1242-1262. [PMID: 34816464 PMCID: PMC9773124 DOI: 10.1002/sim.9267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 09/14/2021] [Accepted: 10/29/2021] [Indexed: 12/25/2022]
Abstract
Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.
Collapse
Affiliation(s)
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Tianwei Yu
- School of Data Science and Warshel Institute, The Chinese University of Hong Kong - Shenzhen, and Shenzhen Research Institute of Big Data, Shenzhen, China
| |
Collapse
|
7
|
Manjang K, Tripathi S, Yli-Harja O, Dehmer M, Emmert-Streib F. Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance. Sci Rep 2020; 10:16672. [PMID: 33028846 PMCID: PMC7542435 DOI: 10.1038/s41598-020-73326-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 08/17/2020] [Indexed: 12/12/2022] Open
Abstract
Gene ontology (GO) is an eminent knowledge base frequently used for providing biological interpretations for the analysis of genes or gene sets from biological, medical and clinical problems. Unfortunately, the interpretation of such results is challenging due to the large number of GO terms, their hierarchical and connected organization as directed acyclic graphs (DAGs) and the lack of tools allowing to exploit this structural information explicitly. For this reason, we developed the R package GOxploreR. The main features of GOxploreR are (I) easy and direct access to structural features of GO, (II) structure-based ranking of GO-terms, (III) mapping to reduced GO-DAGs including visualization capabilities and (IV) prioritizing of GO-terms. The underlying idea of GOxploreR is to exploit a graph-theoretical perspective of GO as manifested by its DAG-structure and the containing hierarchy levels for cumulating semantic information. That means all these features enhance the utilization of structural information of GO and complement existing analysis tools. Overall, GOxploreR provides exploratory as well as confirmatory tools for complementing any kind of analysis resulting in a list of GO-terms, e.g., from differentially expressed genes or gene sets, GWAS or biomarkers. Our R package GOxploreR is freely available from CRAN.
Collapse
Affiliation(s)
- Kalifa Manjang
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Olli Yli-Harja
- Computational Systems Biology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.,Institute for Systems Biology, Seattle, WA, USA.,Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Matthias Dehmer
- Department of Biomedical Computer Science and Mechatronics, UMIT-The Health and Life Science University, 6060, Hall in Tyrol, Austria.,College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
| |
Collapse
|
8
|
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. From hype to reality: data science enabling personalized medicine. BMC Med 2018; 16:150. [PMID: 30145981 PMCID: PMC6109989 DOI: 10.1186/s12916-018-1122-7] [Citation(s) in RCA: 196] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
Collapse
Affiliation(s)
- Holger Fröhlich
- UCB Biosciences GmbH, Alfred-Nobel-Str. Str. 10, 40789 Monheim, Germany
- University of Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19c, 53115 Bonn, Germany
| | - Rudi Balling
- University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg
| | - Niko Beerenwinkel
- Department of Biosciences and Engineering, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| | - Oliver Kohlbacher
- University of Tübingen, WSI/ZBIT, Sand 14, 72076 Tübingen, Germany
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 8, 72076 Tübingen, Germany
- Institute for Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Santosh Kumar
- Department of Computer Science, University of Memphis, 2222 Dunn Hall, Memphis, TN 38152 USA
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marloes H. Maathuis
- ETH Zurich, Seminar für Statistik, Rämistrasse 101, 8092 Zurich, Switzerland
| | - Yves Moreau
- University of Leuven, ESAT, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Susan A. Murphy
- Harvard University, Science Center 400 Suite, Oxford Street, Cambridge, MA 02138-2901 USA
| | - Teresa M. Przytycka
- National Center of Biotechnology Information, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894-6075 USA
| | - Michael Rebhan
- Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Hannes Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, ON M5S 3E1 Canada
| | - Andreas Schuppert
- RWTH Aachen, Joint Research Center for Computational Biomedicine, Pauwelsstrasse 19, 52074 Aachen, Germany
| | - Matthias Schwab
- Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, Aucherbachstrasse 112, 70376 Stuttgart, Germany
- University of Tübingen, Departments of Clinical Pharmacology and of Pharmacy and Biochemistry, Tübingen, Germany
| | - Rainer Spang
- University of Regensburg, Institute of Functional Genomics, Am BioPark 9, 93053 Regensburg, Germany
| | - Daniel Stekhoven
- ETH Zurich, NEXUS Personalized Health Technol., Otto-Stern-Weg 7, 8093 Zurich, Switzerland
| | - Jimeng Sun
- Georgia Tech University, 801 Atlantic Drive, Atlanta, GA 30332-0280 USA
| | - Andreas Weber
- Institute for Computer Science, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany
| | - Daniel Ziemek
- Pfizer, Worldwide Research and Development, Linkstraße 10, 10785 Berlin, Germany
| | - Blaz Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
9
|
Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics. Br J Cancer 2018; 119:200-212. [PMID: 29961760 PMCID: PMC6048110 DOI: 10.1038/s41416-018-0162-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 05/14/2018] [Accepted: 06/04/2018] [Indexed: 01/03/2023] Open
Abstract
Background The increasing incidence of oropharyngeal squamous cell carcinoma (OPSCC) is mainly related to human papillomavirus (HPV) infection. As OPSCCs are often diagnosed at an advanced stage, mortality and morbidity remain high. There are no diagnostic biomarkers for early detection of OPSCC. Methods Serum from 25 patients with stage I–II OPSCC, and 12 healthy controls, was studied with quantitative label-free proteomics using ultra-definition MSE. Statistical analyses were performed to identify the proteins most reliably distinguishing early-stage OPSCCs from controls. P16 was used as a surrogate marker for HPV. P16-positive and P16-negative tumours were analysed separately. Results With two or more unique proteins per identification, 176 proteins were quantified. A clear separation between patients with early-stage tumours and controls was seen in principal component analysis. Latent structures discriminant analysis identified 96 proteins, most reliably differentiating OPSCC patients from controls, with 13 upregulated and 83 downregulated proteins in study cases. The set of proteins was studied further with network, pathway and protein–protein interaction analyses, and found to participate in lipid metabolism, for example. Conclusions We found a set of serum proteins distinguishing early-stage OPSCC from healthy individuals, and suggest a protein set for further evaluation as a diagnostic biomarker panel for OPSCC.
Collapse
|
10
|
Wu M, Zhu L, Feng X. Network-based feature screening with applications to genome data. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1097] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Zhang C, Liu J, Shi Q, Zeng T, Chen L. Comparative network stratification analysis for identifying functional interpretable network biomarkers. BMC Bioinformatics 2017; 18:48. [PMID: 28361683 PMCID: PMC5374559 DOI: 10.1186/s12859-017-1462-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND A major challenge of bioinformatics in the era of precision medicine is to identify the molecular biomarkers for complex diseases. It is a general expectation that these biomarkers or signatures have not only strong discrimination ability, but also readable interpretations in a biological sense. Generally, the conventional expression-based or network-based methods mainly capture differential genes or differential networks as biomarkers, however, such biomarkers only focus on phenotypic discrimination and usually have less biological or functional interpretation. Meanwhile, the conventional function-based methods could consider the biomarkers corresponding to certain biological functions or pathways, but ignore the differential information of genes, i.e., disregard the active degree of particular genes involved in particular functions, thereby resulting in less discriminative ability on phenotypes. Hence, it is strongly demanded to develop elaborate computational methods to directly identify functional network biomarkers with both discriminative power on disease states and readable interpretation on biological functions. RESULTS In this paper, we present a new computational framework based on an integer programming model, named as Comparative Network Stratification (CNS), to extract functional or interpretable network biomarkers, which are of strongly discriminative power on disease states and also readable interpretation on biological functions. In addition, CNS can not only recognize the pathogen biological functions disregarded by traditional Expression-based/Network-based methods, but also uncover the active network-structures underlying such dysregulated functions underestimated by traditional Function-based methods. To validate the effectiveness, we have compared CNS with five state-of-the-art methods, i.e. GSVA, Pathifier, stSVM, frSVM and AEP on four datasets of different complex diseases. The results show that CNS can enhance the discriminative power of network biomarkers, and further provide biologically interpretable information or disease pathogenic mechanism of these biomarkers. A case study on type 1 diabetes (T1D) demonstrates that CNS can identify many dysfunctional genes and networks previously disregarded by conventional approaches. CONCLUSION Therefore, CNS is actually a powerful bioinformatics tool, which can identify functional or interpretable network biomarkers with both discriminative power on disease states and readable interpretation on biological functions. CNS was implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/CNSpackage_0.1.rar .
Collapse
Affiliation(s)
- Chuanchao Zhang
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, 430072, China
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Juan Liu
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, 430072, China.
| | - Qianqian Shi
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
12
|
Zhang L, Liu H, Huang Y, Wang X, Chen Y, Meng J. Cancer Progression Prediction Using Gene Interaction Regularized Elastic Net. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:145-154. [PMID: 28055897 PMCID: PMC5374042 DOI: 10.1109/tcbb.2015.2511758] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Different types of genomic aberration may simultaneously contribute to tumorigenesis. To obtain a more accurate prognostic assessment to guide therapeutic regimen choice for cancer patients, the heterogeneous multi-omics data should be integrated harmoniously, which can often be difficult. For this purpose, we propose a Gene Interaction Regularized Elastic Net (GIREN) model that predicts clinical outcome by integrating multiple data types. GIREN conveniently embraces both gene measurements and gene-gene interaction information under an elastic net formulation, enforcing structure sparsity, and the "grouping effect" in solution to select the discriminate features with prognostic value. An iterative gradient descent algorithm is also developed to solve the model with regularized optimization. GIREN was applied to human ovarian cancer and breast cancer datasets obtained from The Cancer Genome Atlas, respectively. Result shows that, the proposed GIREN algorithm obtained more accurate and robust performance over competing algorithms (LASSO, Elastic Net, and Semi-supervised PCA, with or without average pathway expression features) in predicting cancer progression on both two datasets in terms of median area under curve (AUC) and interquartile range (IQR), suggesting a promising direction for more effective integration of gene measurement and gene interaction information.
Collapse
Affiliation(s)
- Lin Zhang
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Hui Liu
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | | | - Xuesong Wang
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yidong Chen
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center, San Antonio, TX 78229
| | - Jia Meng
- To whom correspondence should be addressed
| |
Collapse
|
13
|
Abstract
Developing improved approaches for diagnosis, treatment, and prevention of diseases is a major goal of biomedical research. Therefore, the discovery of biomarker signatures from high-throughput "omics" data is an active research topic in the field of bioinformatics and systems medicine. A major issue is the low reproducibility and the limited biological interpretability of candidate biomarker signatures identified from high-throughput data. This impedes the use of discovered biomarker signatures into clinical applications. Currently, much focus is placed on developing strategies to improve reproducibility and interpretability. Researchers have fruitfully started to incorporate prior knowledge derived from pathways and molecular networks into the process of biomarker identification. In this chapter, after giving a general introduction to the problem of disease classification and biomarker discovery, we will review two types of network-assisted approaches: (1) approaches inferring activity scores for specific pathways which are subsequently used for classification and (2) approaches identifying subnetworks or modules of molecular networks by differential network analysis which can serve as biomarker signatures.
Collapse
|
14
|
Identifying dense subgraphs in protein–protein interaction network for gene selection from microarray data. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13721-015-0104-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
15
|
Zhang X, Gao L, Liu ZP, Chen L. Identifying module biomarker in type 2 diabetes mellitus by discriminative area of functional activity. BMC Bioinformatics 2015; 16:92. [PMID: 25888350 PMCID: PMC4374500 DOI: 10.1186/s12859-015-0519-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Accepted: 02/24/2015] [Indexed: 02/07/2023] Open
Abstract
Background Identifying diagnosis and prognosis biomarkers from expression profiling data is of great significance for achieving personalized medicine and designing therapeutic strategy in complex diseases. However, the reproducibility of identified biomarkers across tissues and experiments is still a challenge for this issue. Results We propose a strategy based on discriminative area of module activities to identify gene biomarkers which interconnect as a subnetwork or module by integrating gene expression data and protein-protein interactions. Then, we implement the procedure in T2DM as a case study and identify a module biomarker with 32 genes from mRNA expression data in skeletal muscle for T2DM. This module biomarker is enriched with known causal genes and related functions of T2DM. Further analysis shows that the module biomarker is of superior performance in classification, and has consistently high accuracies across tissues and experiments. Conclusion The proposed approach can efficiently identify robust and functionally meaningful module biomarkers in T2DM, and could be employed in biomarker discovery of other complex diseases characterized by expression profiles. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0519-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xindong Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, 710000, China. .,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, 710000, China. .,Institute of Industrial Science, University of Tokyo, Tokyo, 153-8505, Japan.
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Shandong, 250061, China.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. .,Institute of Industrial Science, University of Tokyo, Tokyo, 153-8505, Japan. .,School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
16
|
Schramm SJ, Jayaswal V, Goel A, Li SS, Yang YH, Mann GJ, Wilkins MR. Molecular interaction networks for the analysis of human disease: utility, limitations, and considerations. Proteomics 2014; 13:3393-405. [PMID: 24166987 DOI: 10.1002/pmic.201200570] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 01/01/2023]
Abstract
High-throughput '-omics' data can be combined with large-scale molecular interaction networks, for example, protein-protein interaction networks, to provide a unique framework for the investigation of human molecular biology. Interest in these integrative '-omics' methods is growing rapidly because of their potential to understand complexity and association with disease; such approaches have a focus on associations between phenotype and "network-type." The potential of this research is enticing, yet there remain a series of important considerations. Here, we discuss interaction data selection, data quality, the relative merits of using data from large high-throughput studies versus a meta-database of smaller literature-curated studies, and possible issues of sociological or inspection bias in interaction data. Other work underway, especially international consortia to establish data formats, quality standards and address data redundancy, and the improvements these efforts are making to the field, is also evaluated. We present options for researchers intending to use large-scale molecular interaction networks as a functional context for protein or gene expression data, including microRNAs, especially in the context of human disease.
Collapse
Affiliation(s)
- Sarah-Jane Schramm
- Sydney Medical School, Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, NSW, Australia; Melanoma Institute Australia, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|
17
|
Cun Y, Fröhlich H. netClass: an R-package for network based, integrative biomarker signature discovery. Bioinformatics 2014; 30:1325-6. [PMID: 24443376 DOI: 10.1093/bioinformatics/btu025] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
In the past years, there has been a growing interest in methods that incorporate network information into classification algorithms for biomarker signature discovery in personalized medicine. The general hope is that this way the typical low reproducibility of signatures, together with the difficulty to link them to biological knowledge, can be addressed. Complementary to these efforts, there is an increasing interest in integrating different data entities (e.g. gene and miRNA expressions) into comprehensive models. To our knowledge, R-package netClass is the first software that addresses both, network and data integration. Besides several published approaches for network integration, it specifically contains our recently published STSVM method, which allows for additional integration of gene and miRNA expression data into one predictive classifier.
Collapse
Affiliation(s)
- Yupeng Cun
- Bonn-Aachen International Center for IT (B-IT), University of Bonn, Dahlmannstr. 2, 53113 Bonn, Germany
| | | |
Collapse
|
18
|
Fröhlich H. Including network knowledge into Cox regression models for biomarker signature discovery. Biom J 2014; 56:287-306. [DOI: 10.1002/bimj.201300035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 10/10/2013] [Accepted: 10/25/2013] [Indexed: 12/14/2022]
Affiliation(s)
- Holger Fröhlich
- Rheinische Friedrich-Wilhelms-Universität Bonn; Bonn-Aachen International Center for IT, Algorithmic Bioinformatics; Dahlmannstr. 2 53113 Bonn Germany
| |
Collapse
|
19
|
Cun Y, Fröhlich H. Network and data integration for biomarker signature discovery via network smoothed T-statistics. PLoS One 2013; 8:e73074. [PMID: 24019896 PMCID: PMC3760887 DOI: 10.1371/journal.pone.0073074] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 07/16/2013] [Indexed: 01/01/2023] Open
Abstract
Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our data integration strategy can improve classification performance compared to using a single data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org).
Collapse
Affiliation(s)
- Yupeng Cun
- Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, Bonn, Germany
| | - Holger Fröhlich
- Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, Bonn, Germany
- * E-mail:
| |
Collapse
|