1
|
Hossain SMM, Khatun L, Ray S, Mukhopadhyay A. Pan-cancer classification by regularized multi-task learning. Sci Rep 2021; 11:24252. [PMID: 34930937 PMCID: PMC8688544 DOI: 10.1038/s41598-021-03554-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 12/06/2021] [Indexed: 01/16/2023] Open
Abstract
Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples.
Collapse
Affiliation(s)
| | - Lutfunnesa Khatun
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India
| | - Sumanta Ray
- Computer Science and Engineering, Aliah University, Kolkata, 700160, India.
| | - Anirban Mukhopadhyay
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India.
| |
Collapse
|
2
|
Chang Y, Rager JE, Tilton SC. Linking Coregulated Gene Modules with Polycyclic Aromatic Hydrocarbon-Related Cancer Risk in the 3D Human Bronchial Epithelium. Chem Res Toxicol 2021; 34:1445-1455. [PMID: 34048650 PMCID: PMC8560124 DOI: 10.1021/acs.chemrestox.0c00333] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Exposure to polycyclic aromatic hydrocarbons (PAHs) often occurs as complex chemical mixtures, which are linked to numerous adverse health outcomes in humans, with cancer as the greatest concern. The cancer risk associated with PAH exposures is commonly evaluated using the relative potency factor (RPF) approach, which estimates PAH mixture carcinogenic potential based on the sum of relative potency estimates of individual PAHs, compared to benzo[a]pyrene (BAP), a reference carcinogen. The present study evaluates molecular mechanisms related to PAH cancer risk through integration of transcriptomic and bioinformatic approaches in a 3D human bronchial epithelial cell model. Genes with significant differential expression from human bronchial epithelium exposed to PAHs were analyzed using a weighted gene coexpression network analysis (WGCNA) two-tiered approach: first to identify gene sets comodulated to RPF and second to link genes to a more comprehensive list of regulatory values, including inhalation-specific risk values. Over 3000 genes associated with processes of cell cycle regulation, inflammation, DNA damage, and cell adhesion processes were found to be comodulated with increasing RPF with pathways for cell cycle S phase and cytoskeleton actin identified as the most significantly enriched biological networks correlated to RPF. In addition, comodulated genes were linked to additional cancer-relevant risk values, including inhalation unit risks, oral cancer slope factors, and cancer hazard classifications from the World Health Organization's International Agency for Research on Cancer (IARC). These gene sets represent potential biomarkers that could be used to evaluate cancer risk associated with PAH mixtures. Among the values tested, RPF values and IARC categorizations shared the most similar responses in positively and negatively correlated gene modules. Together, we demonstrated a novel manner of integrating gene sets with chemical toxicity equivalence estimates through WGCNA to understand potential mechanisms.
Collapse
Affiliation(s)
- Yvonne Chang
- Environmental and Molecular Toxicology Department, Oregon State University, Corvallis, OR, United States
| | - Julia E. Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina, Chapel Hill, NC, United States
- Institute for Environmental Health Solutions, and Curriculum in Toxicology, The University of North Carolina, Chapel Hill, NC, United States
| | - Susan C. Tilton
- Environmental and Molecular Toxicology Department, Oregon State University, Corvallis, OR, United States
- Superfund Research Program, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
3
|
Hossain SMM, Khatun L, Ray S, Mukhopadhyay A. Identification of key immune regulatory genes in HIV-1 progression. Gene 2021; 792:145735. [PMID: 34048875 DOI: 10.1016/j.gene.2021.145735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 05/20/2021] [Indexed: 11/16/2022]
Abstract
Human immunodeficiency virus (HIV) infection causes acquired immunodeficiency syndrome (AIDS), one of the most devastating diseases affecting humankind. Here, we have proposed a framework to examine the differences among microarray gene expression data of uninfected and three different HIV-1 infection stages using module preservation statistics. We leverage the advantage of gene co-expression networks (GCN) constructed for each infection stages to detect the topological and structural changes of a group of differentially expressed genes. We examine the relationship among a set of co-expression modules by constructing a module eigengene network considering the overall similarity/dissimilarity among the genes within the modules. We have utilized different module preservation statistics with two composite statistics: "Zsummary" and "MedianRank" to examine the changes in co-expression patterns between modules. We have found several interesting results on the preservation characteristics of gene modules across different stages. Some genes are identified to be preserved in a pair of stages while altering their characteristics across other stages. We further validated the obtained results using permutation test and classification techniques. The biological significances of the obtained modules have also been examined using gene ontology and pathway-based analysis. Additionally, we have identified a set of key immune regulatory hub genes in the associated protein-protein interaction networks (PPINs) of the differentially expressed (DE) genes, which interacts with HIV-1 proteins and are likely to act as potential biomarkers in HIV-1 progression.
Collapse
Affiliation(s)
- Sk Md Mosaddek Hossain
- Department of Computer Science and Engineering, Aliah University, Kolkata 700160, India; Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India.
| | - Lutfunnesa Khatun
- Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India
| | - Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata 700160, India.
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India.
| |
Collapse
|
4
|
Pei S, Guan J. Classifying Cognitive Normal and Early Mild Cognitive Impairment of Alzheimer’s Disease by Applying Restricted Boltzmann Machine to fMRI Data. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200618152109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Neuroimaging is an important tool in early detection of Alzheimer’s disease
(AD), which is a serious neurodegenerative brain disease among the elderly subjects. Independent
component analysis (ICA) is arguably one of the most widely used algorithm for the analysis of
brain imaging data, which can be used to extract intrinsic networks of brain from functional
magnetic resonance imaging (fMRI).
Method:
Witnessed by recent studies, a more flexible model known as restricted Boltzmann
machine (RBM) can also be used to extract spatial maps and time courses of intrinsic networks from
resting state fMRI, moreover, RBM shows superior temporal features than ICA. Here, we seek to
employ RBM to improve the performance of classifying individuals. Experiments are performed on
healthy controls and subjects at the early stage of AD, i.e., cognitive normal (CN) and early mild
cognitive impairment participants (EMCI), and two types of data, i.e., structural magnetic resonance
imaging (sMRI) and fMRI data.
Results:
(1) By separately employing ICA for sMRI and fMRI, the features extracted from fMRI
improve classification accuracy by 7.5% for CN and EMCI; (2) instead of applying ICA to fMRI,
using RBM further improves classification accuracy by 7.75% for CN and EMCI; (3) the lesions at
the early stage of AD are more likely to occur in the regions around slices 4, 6, 10, 14, 19, 51 and 59
of the whole brain in the longitudinal direction.
Conclusion:
By using fMRI instead of sMRI and RBM instead of ICA, we can classify CN and
EMCI more efficiently.
Collapse
Affiliation(s)
- Shengbing Pei
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| |
Collapse
|
5
|
Hossain SMM, Halsana AA, Khatun L, Ray S, Mukhopadhyay A. Discovering key transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian mixture model. Sci Rep 2021; 11:7853. [PMID: 33846515 PMCID: PMC8041769 DOI: 10.1038/s41598-021-87234-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 03/23/2021] [Indexed: 12/18/2022] Open
Abstract
Pancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer, late detection leading to its therapeutic failure. This study aims to determine the key regulatory genes and their impacts on the disease’s progression, helping the disease’s etiology, which is still mostly unknown. We leverage the landmark advantages of time-series gene expression data of this disease and thereby identified the key regulators that capture the characteristics of gene activity patterns in the cancer progression. We have identified the key gene modules and predicted the functions of top genes from a reconstructed gene association network (GAN). A variation of the partial correlation method is utilized to analyze the GAN, followed by a gene function prediction task. Moreover, we have identified regulators for each target gene by gene regulatory network inference using the dynamical GENIE3 (dynGENIE3) algorithm. The Dirichlet process Gaussian process mixture model and cubic spline regression model (splineTimeR) are employed to identify the key gene modules and differentially expressed genes, respectively. Our analysis demonstrates a panel of key regulators and gene modules that are crucial for PDAC disease progression.
Collapse
Affiliation(s)
- Sk Md Mosaddek Hossain
- Computer Science and Engineering, Aliah University, Kolkata, 700160, India. .,Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India.
| | | | - Lutfunnesa Khatun
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India
| | - Sumanta Ray
- Computer Science and Engineering, Aliah University, Kolkata, 700160, India.
| | - Anirban Mukhopadhyay
- Computer Science and Engineering, University of Kalyani, Kalyani, 741235, India.
| |
Collapse
|
6
|
Bi S, Liu R, He L, Li J, Gu J. Bioinformatics analysis of common key genes and pathways of intracranial, abdominal, and thoracic aneurysms. BMC Cardiovasc Disord 2021; 21:14. [PMID: 33407182 PMCID: PMC7788746 DOI: 10.1186/s12872-020-01838-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 12/18/2020] [Indexed: 02/08/2023] Open
Abstract
Background Aneurysm is a severe and fatal disease. This study aims to comprehensively identify the highly conservative co-expression modules and hub genes in the abdominal aortic aneurysm (AAA), thoracic aortic aneurysm (TAA) and intracranial aneurysm (ICA) and facilitate the discovery of pathogenesis for aneurysm. Methods GSE57691, GSE122897, and GSE5180 microarray datasets were downloaded from the Gene Expression Omnibus database. We selected highly conservative modules using weighted gene co‑expression network analysis before performing the Gene Ontology, Kyoto Encyclopedia of Genes and Genomes pathway and Reactome enrichment analysis. The protein–protein interaction (PPI) network and the miRNA-hub genes network were constructed. Furtherly, we validated the preservation of hub genes in three other datasets. Results Two modules with 193 genes and 159 genes were identified as well preserved in AAA, TAA, and ICA. The enrichment analysis identified that these genes were involved in several biological processes such as positive regulation of cytosolic calcium ion concentration, hemostasis, and regulation of secretion by cells. Ten highly connected PPI networks were constructed, and 55 hub genes were identified. In the miRNA-hub genes network, CCR7 was the most connected gene, followed by TNF and CXCR4. The most connected miRNAs were hsa-mir-26b-5p and hsa-mir-335-5p. The hub gene module was proved to be preserved in all three datasets. Conclusions Our study highlighted and validated two highly conservative co-expression modules and miRNA-hub genes network in three kinds of aneurysms, which may promote understanding of the aneurysm and provide potential therapeutic targets and biomarkers of aneurysm.
Collapse
Affiliation(s)
- Siwei Bi
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Ruiqi Liu
- Department of Burn and Plastic Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Linfeng He
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Jingyi Li
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Jun Gu
- Department of Cardiovascular Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.
| |
Collapse
|
7
|
Delgado-Chaves FM, Gómez-Vela F, Divina F, García-Torres M, Rodriguez-Baena DS. Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks. Genes (Basel) 2020; 11:E831. [PMID: 32708319 PMCID: PMC7397019 DOI: 10.3390/genes11070831] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 06/26/2020] [Accepted: 07/13/2020] [Indexed: 12/21/2022] Open
Abstract
Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E Δ H S C compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E Δ H S C mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E Δ H S C mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.
Collapse
|
8
|
Brain-wide functional architecture remodeling by alcohol dependence and abstinence. Proc Natl Acad Sci U S A 2020; 117:2149-2159. [PMID: 31937658 DOI: 10.1073/pnas.1909915117] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Alcohol abuse and alcohol dependence are key factors in the development of alcohol use disorder, which is a pervasive societal problem with substantial economic, medical, and psychiatric consequences. Although our understanding of the neurocircuitry that underlies alcohol use has improved, novel brain regions that are involved in alcohol use and novel biomarkers of alcohol use need to be identified. The present study used a single-cell whole-brain imaging approach to 1) assess whether abstinence from alcohol in an animal model of alcohol dependence alters the functional architecture of brain activity and modularity, 2) validate our current knowledge of the neurocircuitry of alcohol abstinence, and 3) discover brain regions that may be involved in alcohol use. Alcohol abstinence resulted in the whole-brain reorganization of functional architecture in mice and a pronounced decrease in modularity that was not observed in nondependent moderate drinkers. Structuring of the alcohol abstinence network revealed three major brain modules: 1) extended amygdala module, 2) midbrain striatal module, and 3) cortico-hippocampo-thalamic module, reminiscent of the three-stage theory. Many hub brain regions that control this network were identified, including several that have been previously overlooked in alcohol research. These results identify brain targets for future research and demonstrate that alcohol use and dependence remodel brain-wide functional architecture to decrease modularity. Further studies are needed to determine whether the changes in coactivation and modularity that are associated with alcohol abstinence are causal features of alcohol dependence or a consequence of excessive drinking and alcohol exposure.
Collapse
|
9
|
Jahanshahi M, Saeidi M, Nikmahzar E, Babakordi F, Bahlakeh G. Effects of hCG on reduced numbers of hCG receptors in the prefrontal cortex and cerebellum of rat models of Alzheimer's disease. Biotech Histochem 2019; 94:360-365. [PMID: 30760053 DOI: 10.1080/10520295.2019.1571228] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Age-associated changes in the levels of luteinizing hormone and human chorionic gonadotropin (hCG) are potential risk factors for Alzheimer's disease (AD); hCG concentration is related to the incidence of AD. The highest density of hCG receptors is in zones of the brain that are vulnerable to AD and streptozotocin (STZ) can decrease the density of this receptor. We investigated the effects of different doses of hCG on hCG receptor density in the prefrontal cortex and cerebellum in a rat model of STZ-induced AD. AD was induced by intracerebroventricular injection of 3 mg/kg STZ. The resulting AD rats were treated for 3 days with 50, 100 or 200 IU/200 μl hCG, or with saline as a control. Sections of prefrontal cortex and cerebellum were stained immunohistochemically and hCG receptor-immunoreactive (ir) neurons were counted. STZ injected into the lateral ventricles of rat brains reduced the density of hCG receptor-ir neurons in the prefrontal cortex and cerebellum. hCG administration resulted in a significant dose-dependent increase in the number of hCG receptor-ir neurons in the prefrontal cortex and cerebellum. The maximum increase in the number of receptors occurred following the 200 IU dose of hCG. Administration of hCG ameliorated the lowered density of hCG receptor-ir neurons in the cerebellum and prefrontal cortex in STZ-induced AD rats.
Collapse
Affiliation(s)
- M Jahanshahi
- a Neuroscience Research Center, Golestan University of Medical Sciences , Gorgan , Iran
| | - M Saeidi
- b Stem Cell Research Center, Golestan University of Medical Sciences , Gorgan , Iran
| | - E Nikmahzar
- a Neuroscience Research Center, Golestan University of Medical Sciences , Gorgan , Iran
| | - F Babakordi
- a Neuroscience Research Center, Golestan University of Medical Sciences , Gorgan , Iran
| | - G Bahlakeh
- a Neuroscience Research Center, Golestan University of Medical Sciences , Gorgan , Iran
| |
Collapse
|