1
|
Saikia S, Si T, Deb D, Bora K, Mallik S, Maulik U, Zhao Z. Lesion detection in women breast's dynamic contrast-enhanced magnetic resonance imaging using deep learning. Sci Rep 2023; 13:22555. [PMID: 38110462 PMCID: PMC10728155 DOI: 10.1038/s41598-023-48553-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 11/28/2023] [Indexed: 12/20/2023] Open
Abstract
Breast cancer is one of the most common cancers in women and the second foremost cause of cancer death in women after lung cancer. Recent technological advances in breast cancer treatment offer hope to millions of women in the world. Segmentation of the breast's Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) is one of the necessary tasks in the diagnosis and detection of breast cancer. Currently, a popular deep learning model, U-Net is extensively used in biomedical image segmentation. This article aims to advance the state of the art and conduct a more in-depth analysis with a focus on the use of various U-Net models in lesion detection in women's breast DCE-MRI. In this article, we perform an empirical study of the effectiveness and efficiency of U-Net and its derived deep learning models including ResUNet, Dense UNet, DUNet, Attention U-Net, UNet++, MultiResUNet, RAUNet, Inception U-Net and U-Net GAN for lesion detection in breast DCE-MRI. All the models are applied to the benchmarked 100 Sagittal T2-Weighted fat-suppressed DCE-MRI slices of 20 patients and their performance is compared. Also, a comparative study has been conducted with V-Net, W-Net, and DeepLabV3+. Non-parametric statistical test Wilcoxon Signed Rank Test is used to analyze the significance of the quantitative results. Furthermore, Multi-Criteria Decision Analysis (MCDA) is used to evaluate overall performance focused on accuracy, precision, sensitivity, F[Formula: see text]-score, specificity, Geometric-Mean, DSC, and false-positive rate. The RAUNet segmentation model achieved a high accuracy of 99.76%, sensitivity of 85.04%, precision of 90.21%, and Dice Similarity Coefficient (DSC) of 85.04% whereas ResNet achieved 99.62% accuracy, 62.26% sensitivity, 99.56% precision, and 72.86% DSC. ResUNet is found to be the most effective model based on MCDA. On the other hand, U-Net GAN takes the least computational time to perform the segmentation task. Both quantitative and qualitative results demonstrate that the ResNet model performs better than other models in segmenting the images and lesion detection, though computational time in achieving the objectives varies.
Collapse
Affiliation(s)
- Sudarshan Saikia
- Information Technology Department, Oil India Limited, Duliajan, Assam, 786602, India
| | - Tapas Si
- AI Innovation Lab, Department of Computer Science & Engineering, University of Engineering & Management, Jaipur, GURUKUL, Jaipur, Rajasthan, 303807, India
| | - Darpan Deb
- Department of Computer Application, Christ University, Bengaluru, 560029, India
| | - Kangkana Bora
- Department of Computer Science and Information Technology, Cotton University, Guwahati, Assam, 781001, India
| | - Saurav Mallik
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
2
|
Das R, Bose S, Chowdhury RS, Maulik U. Dense Dilated Multi-Scale Supervised Attention-Guided Network for histopathology image segmentation. Comput Biol Med 2023; 163:107182. [PMID: 37379615 DOI: 10.1016/j.compbiomed.2023.107182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/24/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
Over the last couple of decades, the introduction and proliferation of whole-slide scanners led to increasing interest in the research of digital pathology. Although manual analysis of histopathological images is still the gold standard, the process is often tedious and time consuming. Furthermore, manual analysis also suffers from intra- and interobserver variability. Separating structures or grading morphological changes can be difficult due to architectural variability of these images. Deep learning techniques have shown great potential in histopathology image segmentation that drastically reduces the time needed for downstream tasks of analysis and providing accurate diagnosis. However, few algorithms have clinical implementations. In this paper, we propose a new deep learning model Dense Dilated Multiscale Supervised Attention-Guided (D2MSA) Network for histopathology image segmentation that makes use of deep supervision coupled with a hierarchical system of novel attention mechanisms. The proposed model surpasses state-of-the-art performance while using similar computational resources. The performance of the model has been evaluated for the tasks of gland segmentation and nuclei instance segmentation, both of which are clinically relevant tasks to assess the state and progress of malignancy. Here, we have used histopathology image datasets for three different types of cancer. We have also performed extensive ablation tests and hyperparameter tuning to ensure the validity and reproducibility of the model performance. The proposed model is available at www.github.com/shirshabose/D2MSA-Net.
Collapse
Affiliation(s)
- Rangan Das
- Department of Computer Science Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| | - Shirsha Bose
- Department of Informatics, Technical University of Munich, Munich, Bavaria 85748, Germany.
| | - Ritesh Sur Chowdhury
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| | - Ujjwal Maulik
- Department of Computer Science Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.
| |
Collapse
|
3
|
Mallik S, Sarkar A, Nath S, Maulik U, Das S, Pati SK, Ghosh S, Zhao Z. 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection. Front Genet 2023; 14:1095330. [PMID: 36865387 PMCID: PMC9971618 DOI: 10.3389/fgene.2023.1095330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/30/2023] [Indexed: 02/16/2023] Open
Abstract
In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
Collapse
Affiliation(s)
- Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| | - Anasua Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Sagnik Nath
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Supantha Das
- Department of Information Technology, Academy of Technology, Hooghly, West Bengal, India
| | - Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University, Kolkata, West Bengal, India
| | - Soumadip Ghosh
- Department of Computer Science & Engineering, Sister Nivedita University, New Town, West Bengal, India
| | - Zhongming Zhao
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States,*Correspondence: Saurav Mallik, , ; Zhongming Zhao,
| |
Collapse
|
4
|
Basu A, Sarkar A, Bandyopadhyay S, Maulik U. In silico strategies to identify protein-protein interaction modulator in cell-to-cell transmission of SARS CoV2. Transbound Emerg Dis 2022; 69:3896-3905. [PMID: 36379049 DOI: 10.1111/tbed.14760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 07/08/2022] [Accepted: 09/15/2022] [Indexed: 11/16/2022]
Abstract
RNA sequence data from SARS CoV2 patients helps to construct a gene network related to this disease. A detailed analysis of the human host response to SARS CoV2 with expression profiling by high-throughput sequencing has been accomplished with primary human lung epithelial cell lines. Using this data, the clustered gene annotation and gene network construction are performed with the help of the String database. Among the four clusters identified, only 1 with 44 genes could be annotated. Interestingly, this corresponded to basal cells with p = 1.37e - 05, which is relevant for respiratory tract infection. Functional enrichment analysis of genes present in the gene network has been completed using the String database and the Network Analyst tool. Among three types of cell-cell communication, only the anchoring junction between the basal cell membrane and the basal lamina in the host cell is involved in the virus transmission. In this junction point, a hemidesmosome structure plays a vital role in virus spread from one cell to basal lamina in the respiratory tract. In this protein complex structure, different integrin protein molecules of the host cell are used to promote the spread of virus infection into the extracellular matrix. So, small molecular blockers of different anchoring junction proteins, such as integrin alpha 3, integrin beta 1, can provide efficient protection against this deadly viral disease. ORF8 from SARS CoV2 virus can interact with both integrin proteins of human host. By using molecular docking technique, a ternary complex of these three proteins is modelled. Several oligopeptides are predicted as modulators for this ternary complex. In silico analysis of these modulators is very important to develop novel therapeutics for the treatment of SARS CoV2.
Collapse
Affiliation(s)
- Anamika Basu
- Department of Biochemistry, Gurudas College, Kolkata, India
| | - Anasua Sarkar
- Computer Science and Engineering Department, Jadavpur University, Kolkata, India
| | | | - Ujjwal Maulik
- Computer Science and Engineering Department, Jadavpur University, Kolkata, India
| |
Collapse
|
5
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. A network biology approach to identify crucial host targets for COVID-19. Methods 2022; 203:108-115. [PMID: 35364279 PMCID: PMC8960288 DOI: 10.1016/j.ymeth.2022.03.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/09/2022] [Accepted: 03/27/2022] [Indexed: 12/23/2022] Open
Abstract
The ongoing global pandemic of COVID-19, caused by SARS-CoV-2 has killed more than 5.9 million individuals out of ∼43 million confirmed infections. At present, several parts of the world are encountering the 3rd wave. Mass vaccination has been started in several countries but they are less likely to be broadly available for the current pandemic, repurposing of the existing drugs has drawn highest attention for an immediate solution. A recent publication has mapped the physical interactions of SARS-CoV-2 and human proteins by affinity-purification mass spectrometry (AP-MS) and identified 332 high-confidence SARS-CoV-2-human protein-protein interactions (PPIs). Here, we taken a network biology approach and constructed a human protein-protein interaction network (PPIN) with the above SARS-CoV-2 targeted proteins. We utilized a combination of essential network centrality measures and functional properties of the human proteins to identify the critical human targets of SARS-CoV-2. Four human proteins, namely PRKACA, RHOA, CDK5RAP2, and CEP250 have emerged as the best therapeutic targets, of which PRKACA and CEP250 were also found by another group as potential candidates for drug targets in COVID-19. We further found candidate drugs/compounds, such as guanosine triphosphate, remdesivir, adenosine monophosphate, MgATP, and H-89 dihydrochloride that bind the target human proteins. The urgency to prevent the spread of infection and the death of diseased individuals has prompted the search for agents from the pool of approved drugs to repurpose them for COVID-19. Our results indicate that host targeting therapy with the repurposed drugs may be a useful strategy for the treatment of SARS-CoV-2 infection.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Division of Virology, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata 700010, India; Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| | - Santasabuj Das
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata 700010, India; ICMR-National Institute of Occupational Health, Ahmedabad 380016, India.
| |
Collapse
|
6
|
Bej A, Maulik U, Sarkar A. Time-Series Prediction for the Epidemic Trends of COVID-19 Using Conditional Generative Adversarial Networks Regression on Country-Wise Case Studies. SN COMPUT SCI 2022; 3:352. [PMID: 35789572 PMCID: PMC9244013 DOI: 10.1007/s42979-022-01225-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 05/20/2022] [Indexed: 11/27/2022]
Abstract
Probabilistic Regression is a statistical technique and a crucial problem in the machine learning domain which employs a set of machine learning methods to forecast a continuous target variable based on the value of one or multiple predictor variables. COVID-19 is a virulent virus that has brought the whole world to a standstill. The potential of the virus to cause inter human transmission makes the world a dangerous place. This article predicts the upcoming circumstances of the Corona virus to subside its action. We have performed Conditional GAN regression to anticipate the subsequent COVID-19 cases of five countries. The GAN variant CGAN is used to design the model and predict the COVID-19 cases for 3 months ahead with least error for the dataset provided. Each country is examined individually, due to their variation in population size, tradition, medical management and preventive measures. The analysis is based on confirmed data, as provided by the World Health Organization. This paper investigates how conditional Generative Adversarial Networks (GANs) can be used to accurately exhibit intricate conditional distributions. GANs have got spectacular achievement in producing convoluted high-dimensional data, but work done on their use for regression problems is minimal. This paper exhibits how conditional GANs can be employed in probabilistic regression. It is shown that conditional GANs can be used to evaluate a wide range of various distributions and be competitive with existing probabilistic regression models.
Collapse
Affiliation(s)
- Arnabi Bej
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Anasua Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
7
|
Chowdhury SR, Basu S, Maulik U. A survey on event and subevent detection from microblog data towards crisis management. Int J Data Sci Anal 2022. [DOI: 10.1007/s41060-022-00335-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
8
|
Ghosh A, Pahari P, Basak P, Maulik U, Sarkar A. Epileptic-seizure onset detection using PARAFAC model with cross-wavelet transformation on multi-channel EEG. Phys Eng Sci Med 2022; 45:601-612. [DOI: 10.1007/s13246-022-01127-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 04/11/2022] [Indexed: 11/28/2022]
|
9
|
Bose S, Sur Chowdhury R, Das R, Maulik U. Dense Dilated Deep Multiscale Supervised U-Network for biomedical image segmentation. Comput Biol Med 2022; 143:105274. [PMID: 35123135 DOI: 10.1016/j.compbiomed.2022.105274] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/26/2022] [Accepted: 01/26/2022] [Indexed: 12/24/2022]
Abstract
Biomedical image segmentation is essential for computerized medical image analysis. Deep learning algorithms allow us to design state-of-the-art models for solving segmentation problems. The U-Net and its variants have provided positive results across various datasets. However, the existing networks have the same receptive field at each level and the models are supervised only at the shallow level. Considering these two ideas, we have proposed the D3MSU-Net where the field of view in each level is varied depending upon the depth of the resolution layer and the model is supervised at each resolution level. We have evaluated our network in eight benchmark datasets such as Electron Microscopy, Lung segmentation, Montgomery Chest X-ray, Covid-Radiopaedia, Wound, Medetec, Brain MRI, and Covid-19 lung CT dataset. Additionally, we have provided the performance for various ablations. The experimental results show the superiority of the proposed network. The proposed D3MSU-Net and ablation models are available at www.github.com/shirshabose/D3MSUNET.
Collapse
Affiliation(s)
- Shirsha Bose
- Department of Electronics and Telecommunication Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata, 700032, West Bengal, India.
| | - Ritesh Sur Chowdhury
- Department of Electronics and Telecommunication Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata, 700032, West Bengal, India.
| | - Rangan Das
- Department of Computer Science Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata, 700032, West Bengal, India.
| | - Ujjwal Maulik
- Department of Computer Science Engineering, Jadavpur University, 188, Raja S.C. Mallick Rd, Kolkata, 700032, West Bengal, India.
| |
Collapse
|
10
|
Bhadra T, Maulik U. Unsupervised Feature Selection Using Iterative Shrinking and Expansion Algorithm. IEEE Trans Emerg Top Comput Intell 2022. [DOI: 10.1109/tetci.2022.3199704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
- Tapas Bhadra
- Department of Computer Science and Engineering, Aliah University, Kolkata, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
11
|
Kundu S, Maulik U. Cloud deployment of game theoretic categorical clustering using apache spark: An application to car recommendation. Machine Learning with Applications 2021. [DOI: 10.1016/j.mlwa.2021.100100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
12
|
Dey A, Sen S, Maulik U. Study of transcription factor druggabilty for prostate cancer using structure information, gene regulatory networks and protein moonlighting. Brief Bioinform 2021; 23:6444316. [PMID: 34849560 DOI: 10.1093/bib/bbab465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 09/22/2021] [Accepted: 10/07/2021] [Indexed: 11/12/2022] Open
Abstract
Prostate cancer is the second leading cause of cancer-related death in men. Metastasis shows poor survival even though the recovery rate is high. In spite of numerous studies regarding prostate carcinoma, multiple questions are still unanswered. In this regards, gene regulatory network can uncover the mechanisms behind cancer progression, and metastasis. Under a feed forward loop, transcription factors (TFs) can be a good druggable candidate. We have proposed a computational model to study the uncertainty of TFs and suggest the appropriate cellular conditions for drug targeting. We have selected feed-forward loops depending on the shared list of the functional annotations among TFs, genes and miRNAs. From the potential feed forward loop cores, six TFs were identified as druggable targets, which include AR, CEBPB, CREB1, ETS1, NFKB1 and RELA. However, TFs are known for their Protein Moonlighting properties, which provide unrelated multi-functionalities within the same or different subcellular localizations. Following that, we have identified such functions that are suitable for drug targeting. On the other hand, we have tried to identify membraneless organelles for providing more specificity to the proposed time and space theory. The study has provided certain possibilities on TF-based therapeutics. The controlled dynamic nature of the TF may have enhanced the chances where TFs can be considered as one of the prime drug targets. Finally, the combination of membranless phase separation and protein moonlighting has provided possible druggable period within the biological clock.
Collapse
Affiliation(s)
- Ashmita Dey
- Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Sagnik Sen
- Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
13
|
Ghosh N, Saha I, Sarkar JP, Maulik U. Strategies for COVID-19 Epidemiological Surveillance in India: Overall Policies Till June 2021. Front Public Health 2021; 9:708224. [PMID: 34368070 PMCID: PMC8339284 DOI: 10.3389/fpubh.2021.708224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has gripped the entire world, almost paralysing the human race in its entirety. The virus rapidly transmits via human-to-human medium resulting in a massive increase of patients with COVID-19. In order to curb the spread of the disease, an immediate action of complete lockdown was implemented across the globe. India with a population of over 1.3 billion was not an exception and took the challenge to execute phase-wise lockdown, unlock and partial lockdown activities. In this study, we intend to summarise these different phases that the Government of India (GoI) imposed to fight against SARS-CoV-2 so that it can act as a reference guideline to help controlling future waves of COVID-19 and similar pandemic situations in India.
Collapse
Affiliation(s)
- Nimisha Ghosh
- Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
| | | | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
14
|
Ghosh M, Sen S, Sarkar R, Maulik U. Quantum squirrel inspired algorithm for gene selection in methylation and expression data of prostate cancer. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107221] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
15
|
Sen S, Dey A, Bandhyopadhyay S, Uversky VN, Maulik U. Understanding structural malleability of the SARS-CoV-2 proteins and relation to the comorbidities. Brief Bioinform 2021; 22:6304388. [PMID: 34143202 DOI: 10.1093/bib/bbab232] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/13/2021] [Accepted: 05/27/2021] [Indexed: 12/11/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a causative agent of the coronavirus disease (COVID-19), is a part of the $\beta $-Coronaviridae family. The virus contains five major protein classes viz., four structural proteins [nucleocapsid (N), membrane (M), envelop (E) and spike glycoprotein (S)] and replicase polyproteins (R), synthesized as two polyproteins (ORF1a and ORF1ab). Due to the severity of the pandemic, most of the SARS-CoV-2-related research are focused on finding therapeutic solutions. However, studies on the sequences and structure space throughout the evolutionary time frame of viral proteins are limited. Besides, the structural malleability of viral proteins can be directly or indirectly associated with the dysfunctionality of the host cell proteins. This dysfunctionality may lead to comorbidities during the infection and may continue at the post-infection stage. In this regard, we conduct the evolutionary sequence-structure analysis of the viral proteins to evaluate their malleability. Subsequently, intrinsic disorder propensities of these viral proteins have been studied to confirm that the short intrinsically disordered regions play an important role in enhancing the likelihood of the host proteins interacting with the viral proteins. These interactions may result in molecular dysfunctionality, finally leading to different diseases. Based on the host cell proteins, the diseases are divided in two distinct classes: (i) proteins, directly associated with the set of diseases while showing similar activities, and (ii) cytokine storm-mediated pro-inflammation (e.g. acute respiratory distress syndrome, malignancies) and neuroinflammation (e.g. neurodegenerative and neuropsychiatric diseases). Finally, the study unveils that males and postmenopausal females can be more vulnerable to SARS-CoV-2 infection due to the androgen-mediated protein transmembrane serine protease 2.
Collapse
Affiliation(s)
- Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-32, West Bengal, India
| | - Ashmita Dey
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-32, West Bengal, India
| | | | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida, United States of America.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Moscow region, 142290 Russia
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-32, West Bengal, India
| |
Collapse
|
16
|
Dey A, Sen S, Maulik U. Unveiling COVID-19-associated organ-specific cell types and cell-specific pathway cascade. Brief Bioinform 2021; 22:914-923. [PMID: 32968798 PMCID: PMC7543283 DOI: 10.1093/bib/bbaa214] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 07/13/2020] [Accepted: 08/13/2020] [Indexed: 12/13/2022] Open
Abstract
The novel coronavirus or COVID-19 has first been found in Wuhan, China, and became pandemic. Angiotensin-converting enzyme 2 (ACE2) plays a key role in the host cells as a receptor of Spike-I Glycoprotein of COVID-19 which causes final infection. ACE2 is highly expressed in the bladder, ileum, kidney and liver, comparing with ACE2 expression in the lung-specific pulmonary alveolar type II cells. In this study, the single-cell RNAseq data of the five tissues from different humans are curated and cell types with high expressions of ACE2 are identified. Subsequently, the protein-protein interaction networks have been established. From the network, potential biomarkers which can form functional hubs, are selected based on k-means network clustering. It is observed that angiotensin PPAR family proteins show important roles in the functional hubs. To understand the functions of the potential markers, corresponding pathways have been researched thoroughly through the pathway semantic networks. Subsequently, the pathways have been ranked according to their influence and dependency in the network using PageRank algorithm. The outcomes show some important facts in terms of infection. Firstly, renin-angiotensin system and PPAR signaling pathway can play a vital role for enhancing the infection after its intrusion through ACE2. Next, pathway networks consist of few basic metabolic and influential pathways, e.g. insulin resistance. This information corroborate the fact that diabetic patients are more vulnerable to COVID-19 infection. Interestingly, the key regulators of the aforementioned pathways are angiontensin and PPAR family proteins. Hence, angiotensin and PPAR family proteins can be considered as possible therapeutic targets. Contact: sagnik.sen2008@gmail.com, umaulik@cse.jdvu.ac.in Supplementary information: Supplementary data are available online.
Collapse
Affiliation(s)
- Ashmita Dey
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, India
| |
Collapse
|
17
|
Gupta K, Lalit M, Biswas A, Sanada CD, Greene C, Hukari K, Maulik U, Bandyopadhyay S, Ramalingam N, Ahuja G, Ghosh A, Sengupta D. Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data. Genome Res 2021; 31:689-697. [PMID: 33674351 PMCID: PMC8015842 DOI: 10.1101/gr.267070.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 02/22/2021] [Indexed: 12/13/2022]
Abstract
Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single-cell expression data. The zero-inflated version of Poisson/negative binomial and log-normal distributions have emerged as the most popular alternatives owing to their ability to accommodate high dropout rates, as commonly observed in single-cell data. Although the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression ranks, as robust surrogates for transcript abundance. Here we examined the performance of the discrete generalized beta distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method to understand its advantages compared with some of the existing best-practice approaches. We concluded that besides striking a reasonable balance between Type I and Type II errors, ROSeq, the proposed differential expression test, is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq and made it available on the Bioconductor platform.
Collapse
Affiliation(s)
- Krishan Gupta
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Delhi 110020, India
| | - Manan Lalit
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden 01307, Germany
| | - Aditya Biswas
- Microsoft India Private Limited, Hyderabad, Telangana 500032, India
| | - Chad D Sanada
- Fluidigm Corporation, South San Francisco, California 94080, USA
| | - Cassandra Greene
- Fluidigm Corporation, South San Francisco, California 94080, USA
| | - Kyle Hukari
- Fluidigm Corporation, South San Francisco, California 94080, USA
| | - Ujjwal Maulik
- Department of Computer Science, Jadavpur University, Kolkata, West Bengal 700032, India
| | | | | | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi 110020, India
| | - Abhik Ghosh
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata 700108, India
| | - Debarka Sengupta
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Delhi 110020, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi 110020, India.,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, Delhi 110020, India.,Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD 4000, Australia
| |
Collapse
|
18
|
Sarkar JP, Saha I, Sarkar A, Maulik U. Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers. Comput Biol Med 2021; 131:104244. [PMID: 33550016 DOI: 10.1016/j.compbiomed.2021.104244] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/24/2021] [Accepted: 01/24/2021] [Indexed: 12/25/2022]
Abstract
Breast cancer is the second leading cancer type among females. In this regard, it is found that microRNAs play an important role by regulating the gene expressions at the post-transcriptional phase. However, identification of the most influencing miRNAs in breast cancer subtypes is a challenging task, while the recent advancement in Next Generation Sequencing techniques allows analyzing high throughput expression data of miRNAs. Thus, we have conducted this research with the help of NGS data of breast cancer in order to identify the most significant miRNA biomarkers. The selected miRNA biomarkers are highly associated with the multiple breast cancer subtypes. For this purpose, a two-phase technique, called Machine Learning Integrated Ensemble of Feature Selection Methods, followed by survival analysis, is proposed. In the first phase, we have selected the best among seven machine learning techniques based on classification accuracy using the entire set of features (in this case miRNAs). Subsequently, eight different feature selection methods are used separately in order to rank the features and validate each set of top features using the selected machine learning technique by considering a multi-class classification task of the breast cancer subtypes. In the second phase, based on the classification accuracy values, the top features from each feature selection method are considered to make an ensemble to provide further categorization of the miRNAs as 8*, 7* up to 1*. The 8* miRNAs provide the highest average classification accuracy of 86% after 10-fold cross-validation. Thereafter, 27 miRNAs are identified from the list that is confined within 8* to 4* miRNAs based on their importance in survival for breast cancer subtypes using Cox regression based survival analysis. Moreover, expression analysis, regulatory network analysis, protein-protein interaction analysis, KEGG pathway and gene ontology enrichment analysis are performed in order to validate biological significance of the proposed solution. Additionally, we have prepared a miRNA-protein-drug interaction network to identify possible drug for the selected miRNAs. Thus, our findings may be considered during a clinical trial for the treatment of breast cancer patients.
Collapse
Affiliation(s)
- Jnanendra Prasad Sarkar
- Larsen & Toubro Infotech Ltd., Pune, India; Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training & Research, Kolkata, 700106, India.
| | - Anasua Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
19
|
Sarkar JP, Saha I, Seal A, Maity D, Maulik U. Topological Analysis for Sequence Variability: Case Study on more than 2K SARS-CoV-2 sequences of COVID-19 infected 54 countries in comparison with SARS-CoV-1 and MERS-CoV. Infect Genet Evol 2021; 88:104708. [PMID: 33421654 PMCID: PMC7787073 DOI: 10.1016/j.meegid.2021.104708] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 10/27/2020] [Accepted: 12/31/2020] [Indexed: 12/11/2022]
Abstract
The pandemic due to novel coronavirus, SARS-CoV-2 is a serious global concern now. More than thousand new COVID-19 infections are getting reported daily for this virus across the globe. Thus, the medical research communities are trying to find the remedy to restrict the spreading of this virus, while the vaccine development work is still under research in parallel. In such critical situation, not only the medical research community, but also the scientists in different fields like microbiology, pharmacy, bioinformatics and data science are also sharing effort to accelerate the process of vaccine development, virus prediction, forecasting the transmissible probability and reproduction cases of virus for social awareness. With the similar context, in this article, we have studied sequence variability of the virus primarily focusing on three aspects: (a) sequence variability among SARS-CoV-1, MERS-CoV and SARS-CoV-2 in human host, which are in the same coronavirus family, (b) sequence variability of SARS-CoV-2 in human host for 54 different countries and (c) sequence variability between coronavirus family and country specific SARS-CoV-2 sequences in human host. For this purpose, as a case study, we have performed topological analysis of 2391 global genomic sequences of SARS-CoV-2 in association with SARS-CoV-1 and MERS-CoV using an integrated semi-alignment based computational technique. The results of the semi-alignment based technique are experimentally and statistically found similar to alignment based technique and computationally faster. Moreover, the outcome of this analysis can help to identify the nations with homogeneous SARS-CoV-2 sequences, so that same vaccine can be applied to their heterogeneous human population.
Collapse
Affiliation(s)
- Jnanendra Prasad Sarkar
- Larsen & Toubro Infotech Ltd., Pune, Maharashtra, India; Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training & Research, Kolkata, West Bengal, India.
| | - Arijit Seal
- Cognizant Technology Solutions, Kolkata, West Bengal, India
| | - Debasree Maity
- Department of Electronics and Communication Engineering, MCKV Institute of Engineering, Howrah, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
20
|
Begum S, Sarkar R, Chakraborty D, Maulik U. Identification of Biomarker on Biological and Gene Expression data using Fuzzy Preference Based Rough Set. Journal of Intelligent Systems 2020. [DOI: 10.1515/jisys-2019-0034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Cancer is fast becoming an alarming cause of human death. However, it has been reported that if the disease is detected at an early stage, diagnosed, treated appropriately, the patient has better chances of survival long life. Machine learning technique with feature-selection contributes greatly to the detecting of cancer, because an efficient feature-selection method can remove redundant features. In this paper, a Fuzzy Preference-Based Rough Set (FPRS) blended with Support Vector Machine (SVM) has been applied in order to predict cancer biomarkers for biological and gene expression datasets. Biomarkers are determined by deploying three models of FPRS, namely, Fuzzy Upward Consistency (FUC), Fuzzy Downward Consistency (FLC), and Fuzzy Global Consistency (FGC). The efficiency of the three models with SVM on five datasets is exhibited, and the biomarkers that have been identified from FUC models have been reported.
Collapse
Affiliation(s)
- Shemim Begum
- Govt College of Engg. & Textile Technology, Dept. of CSE , Berhampore , Murshidabad, West Bengal , India
| | - Ram Sarkar
- Jadavpur University Ringgold standard institution – Computer Science and Engineering , Jadavpur India
| | | | - Ujjwal Maulik
- Jadavpur University Ringgold standard institution – CSE , Kolkata , West Bengal , India
| |
Collapse
|
21
|
Ghosh KK, Ghosh S, Sen S, Sarkar R, Maulik U. A two-stage approach towards protein secondary structure classification. Med Biol Eng Comput 2020; 58:1723-1737. [PMID: 32472446 DOI: 10.1007/s11517-020-02194-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 05/20/2020] [Indexed: 12/11/2022]
Abstract
Protein secondary structure (PSS) describes the local folded structures which get formed inside a polypeptide due to interactions among atoms of the backbone. Generally, globular proteins are divided into four classes, namely all-α, all-β, α + β, and α/β. As nearly 90% of proteins fall into the said four classes, these are mostly considered for the purpose of computational classification of proteins. Classification of PSS is important for different biological functions that include protein fold recognition, tertiary structure prediction, prediction of DNA-binding sites, and reduction of the conformation search space among others. In this paper, we have proposed a machine learning-based model for secondary structure classification of proteins into four classes: all-α, all-β, α + β, and α/β. In doing so, we have considered both sequence-based and structure-based features. At first, mutual information (MI), a filter-based feature selection method, is used to remove the redundant features, and then these selected features are used to train three different classifiers-random forest, K-nearest neighbor (KNN), and multi-layer perceptron (MLP). After that, some standard classifier combination approaches are applied to integrate the decision made by the said classifiers and it has been found that weighted product rule performs the best among all. The overall accuracies obtained using the proposed model on the four standard datasets, namely 640, 1189, 25pdb, and fc699 are 86.89%, 92.93%, 91.38%, and 94.87% respectively. The proposed model outperforms some state-of-the-art methods considered here for comparison. Significantly high classification accuracy produced by our proposed model on four datasets is attributed to the development of a comprehensive feature set (by eliminating redundant features through feature selection technique) which is then passed through an ensemble consists of three different classifiers. Assigning different weights to the outcome of different classifiers thus proved to be useful in designing the model for predicting the secondary structure of proteins based on its sequence-based and structure-based features. Graphical abstract.
Collapse
Affiliation(s)
- Kushal Kanti Ghosh
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.
| | - Soulib Ghosh
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
22
|
Abstract
POU domain class 2 homebox 1 or POU2F1 is broadly known as an important transcription factor. Due to its association with different types of malignancies, POU2F1 became one of the key factors in pancancer analysis. However, in spite of considering this protein as a potential drug target, none of the drug targeting POU2F1 has been designed as of yet due to the extreme structural flexibility of this protein. In this article, we have proposed a three-level comprehensive framework for understanding the structural conservation and co-variation of POU2F1. First, a gene regulatory network based on the normal and pathological functions of POU2F1 has been created for better understanding the strong association between POU2F1 deregulation and cancers. After that, based on the evolutionary sequence space analysis, the comparative sequence dynamics of the protein members of POU domain family has been studied mostly between non-human and human species. Subsequently, the reciprocity effect of the residual co-variation has been identified through direct coupling analysis. Along with that, the structure of POU2F1 has been analyzed depending on quality assessment and normal mode-based structure network. Comparing the sequence and structure space information, the most significant set of residues viz., 3, 9, 13, 17, 20, 21, 28, 35, and 36 have been identified as structural facet for function. This study demonstrates that the structural malleability of POU2F1 serves as one of the prime reason behind its functional multiplicity in terms of protein moonlighting. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ashmita Dey
- Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Sagnik Sen
- Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Vladimir N Uversky
- Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow Region, Russia.,Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Ujjwal Maulik
- Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
23
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinformatics 2019; 20:736. [PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India. .,Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, P-33, C.I.T.Road Scheme XM, Beliaghata-700010, Kolkata, West Bengal, India.
| |
Collapse
|
24
|
Chowdhury S, Sanyal D, Sen S, Uversky VN, Maulik U, Chattopadhyay K. Evolutionary Analyses of Sequence and Structure Space Unravel the Structural Facets of SOD1. Biomolecules 2019; 9:E826. [PMID: 31817166 PMCID: PMC6995586 DOI: 10.3390/biom9120826] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/09/2019] [Accepted: 11/16/2019] [Indexed: 01/08/2023] Open
Abstract
Superoxide dismutase (SOD) is the primary enzyme of the cellular antioxidant defense cascade. Misfolding, concomitant oligomerization, and higher order aggregation of human cytosolic SOD are linked to amyotrophic lateral sclerosis (ALS). Although, with two metal ion cofactors SOD1 is extremely robust, the de-metallated apo form is intrinsically disordered. Since the rise of oxygen-based metabolism and antioxidant defense systems are evolutionary coupled, SOD is an interesting protein with a deep evolutionary history. We deployed statistical analysis of sequence space to decode evolutionarily co-varying residues in this protein. These were validated by applying graph theoretical modelling to understand the impact of the presence of metal ion co-factors in dictating the disordered (apo) to hidden disordered (wild-type SOD1) transition. Contact maps were generated for different variants, and the selected significant residues were mapped on separate structure networks. Sequence space analysis coupled with structure networks helped us to map the evolutionarily coupled co-varying patches in the SOD1 and its metal-depleted variants. In addition, using structure network analysis, the residues with a major impact on the internal dynamics of the protein structure were investigated. Our results reveal that the bulk of these evolutionarily co-varying residues are localized in the loop regions and positioned differentially depending upon the metal residence and concomitant steric restrictions of the loops.
Collapse
Affiliation(s)
- Sourav Chowdhury
- Protein Folding and Dynamics Group, Structural Biology and Bio-informatics Division, CSIR-Indian Institute of Chemical Biology, 4 Raja S.C.Mullick Road, Kolkata 700032, India; (S.C.); (D.S.)
- Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
| | - Dwipanjan Sanyal
- Protein Folding and Dynamics Group, Structural Biology and Bio-informatics Division, CSIR-Indian Institute of Chemical Biology, 4 Raja S.C.Mullick Road, Kolkata 700032, India; (S.C.); (D.S.)
| | - Sagnik Sen
- Department of Computer Science, Jadavpur University, Kolkata 700032, India; (S.S.); (U.M.)
| | - Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL 33612, USA;
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino 142290, Moscow Region, Russia
| | - Ujjwal Maulik
- Department of Computer Science, Jadavpur University, Kolkata 700032, India; (S.S.); (U.M.)
| | - Krishnananda Chattopadhyay
- Protein Folding and Dynamics Group, Structural Biology and Bio-informatics Division, CSIR-Indian Institute of Chemical Biology, 4 Raja S.C.Mullick Road, Kolkata 700032, India; (S.C.); (D.S.)
| |
Collapse
|
25
|
Chowdhury S, Sen S, Banerjee A, Uversky VN, Maulik U, Chattopadhyay K. Network mapping of the conformational heterogeneity of SOD1 by deploying statistical cluster analysis of FTIR spectra. Cell Mol Life Sci 2019; 76:4145-4154. [PMID: 31011770 PMCID: PMC11105373 DOI: 10.1007/s00018-019-03108-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 04/12/2019] [Accepted: 04/15/2019] [Indexed: 02/02/2023]
Abstract
A crucial contribution to the heterogeneity of the conformational landscape of a protein comes from the way an intermediate relates to another intermediate state in its journey from the unfolded to folded or misfolded form. Unfortunately, it is extremely hard to decode this relatedness in a quantifiable manner. Here, we developed an application of statistical cluster analyses to explore the conformational heterogeneity of a metalloenzyme, human cytosolic copper-zinc superoxide dismutase (SOD1), using the inputs from infrared spectroscopy. This study provides a quantifiable picture of how conformational information at one particular site (for example, the copper-binding pocket) is related to the information at the second site (for example, the zinc-binding pocket), and how this relatedness is transferred to the global conformational information of the protein. The distance outputs were used to quantitatively generate a network capturing the folding sub-stages of SOD1.
Collapse
Affiliation(s)
- Sourav Chowdhury
- Protein Folding and Dynamics Laboratory, Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, 700032, India
- Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | - Sagnik Sen
- Department of Computer Science, Jadavpur University, Kolkata, 700 032, India
| | - Amrita Banerjee
- Protein Folding and Dynamics Laboratory, Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, 700032, India
- Department of Chemistry, Hiralal Mazumdar Memorial College for Women, Dakshineswar, Kolkata, 700035, India
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, USA
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
| | - Ujjwal Maulik
- Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | - Krishnananda Chattopadhyay
- Protein Folding and Dynamics Laboratory, Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, 700032, India.
| |
Collapse
|
26
|
Ray S, Alberuni S, Maulik U. Computational Prediction of HCV-Human Protein-Protein Interaction via Topological Analysis of HCV Infected PPI Modules. IEEE Trans Nanobioscience 2019; 17:55-61. [PMID: 29570075 DOI: 10.1109/tnb.2018.2797696] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this paper, we have developed a framework for detection of protein-protein interactions (PPI) between Hepatitis-C virus (HCV) and human proteins based on PPI and gene ontology based information of the HCV infected proteins. First, a bipartite interaction network is formed between HCV proteins and human host proteins. Next, we have analyzed different topological properties of the interaction network and observed that degree of HCV-interacting proteins is significantly higher than non-interacting host proteins. We have also observed that the HCV interacted protein pairs are functionally similar with each other than the non-interacting pairs. Following the observations, we have applied an inference mechanism to predict novel interactions between HCV and human protein. The inference mechanism is based on partitioning the network formed by HCV interacted human proteins and their first neighbors in dense and functionally similar groups using a PPI network clustering algorithm. The groups are then analyzed to predict PPIs. The predicted interaction pairs are validated using literature search in PUBMED. Experimental evidence of over 50% of the predicted pairs are found in existing literatures by searching PUBMED. A Gene Ontology and pathway based analysis is also carried out to validate the identified modules biologically.
Collapse
|
27
|
Sen S, Dey A, Chowdhury S, Maulik U, Chattopadhyay K. Understanding the evolutionary trend of intrinsically structural disorders in cancer relevant proteins as probed by Shannon entropy scoring and structure network analysis. BMC Bioinformatics 2019; 19:549. [PMID: 30717651 PMCID: PMC7394331 DOI: 10.1186/s12859-018-2552-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 11/30/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Malignant diseases have become a threat for health care system. A panoply of biological processes is involved as the cause of these diseases. In order to unveil the mechanistic details of these diseased states, we analyzed protein families relevant to these diseases. RESULTS Our present study pivots around four apparently unrelated cancer types among which two are commonly occurring viz. Prostate Cancer, Breast Cancer and two relatively less frequent viz. Acute Lymphoblastic Leukemia and Lymphoma. Eight protein families were found to have implications for these cancer types. Our results strikingly reveal that some of the proteins with implications in the cancerous cellular states were showing the structural organization disparate from the signature of the family it constitutes. The sequences were further mapped onto respective structures and compared with the entropic profile. The structures reveal that entropic scores were able to reveal the inherent structural bias of these proteins with quantitative precision, otherwise unseen from other analysis. Subsequently, the betweenness centrality scoring of each residue from the structure network models was resorted to explore the changes in dependencies on residue owing to structural disorder. CONCLUSION These observations help to obtain the mechanistic changes resulting from the structural orchestration of protein structures. Finally, the hydropathy indexes were obtained to validate the sequence space observations using Shannon entropy and in-turn establishing the compatibility.
Collapse
Affiliation(s)
- Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Ashmita Dey
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Sourav Chowdhury
- CSIR-Indian Institute of Chemical Biology, Raja S.C. Mullick Road, Kolkata, 700032, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Krishnananda Chattopadhyay
- Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts, 02138, USA
| |
Collapse
|
28
|
|
29
|
Maulik U, Uversky VN, Sen S. A Statistical Approach to Detect Intrinsically Disordered Proteins Associated with Uterine Leiomyoma. Protein Pept Lett 2018; 25:483-491. [PMID: 29577850 DOI: 10.2174/0929866525666180326114325] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 03/05/2018] [Accepted: 03/08/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND Uterine Leiomyoma is mainly widespread non-malignant tumor. Around more than 80% woman have these particular tumor among them only 30% of them are detected. Integrin-ᵦ1 is one of the up regulated biomarkers during tumorigenesis which is also associated with structural disordered. Intrinsically disordered proteins are one of the types which are dealing with un-structuredness especially in tertiary structural orchestration. Around 30% of the human proteins consist of intrinsically disordered regions. It is obvious that IDPs should have a significant change of functional activities under structure-function paradigm. Mostly IDPs are associated with malignancies, neurodegenerative diseases and heart diseases. DNA methylation is one Post Transcriptional Modification (PTM) techniques where methyl groups are added to nucleotide bases. It is responsible to control the functionality of Transcription Factors (TFs). Along with that, the structural orchestration is also affected due to PTM. Very few diseases related studies are focused on structural disordered along with methylation. OBJECTIVE In this article, our motivation is to establish a relation between uterine leiomyoma at differential methylation rate and tissue specific disordered proteins. METHOD In this article, we propose a framework for achieving our aforementioned object. We start with two set of data i.e., set of gene specifically related with uterine leiomyoma (GUL) and set of tissue specific proteins from uniprot (Puterine). Subsequently, 'two sample T-Test' is applied on GUL to find differentially methylated sample for uterine leiomyoma (DGUL). Comparing the gene transcripts of DGUL with the Puterine , the common biomarkers are selected (DPuterine). Thereafter the selected list of proteins is analyzed under D2P2 to find percentage disorder rate, number SCOP, number protein families and rate PTM. Proteins, with more than 10% of structural disorder rate, consider as structurally disordered (PUL disordered). Finally, to validate the listed up proteins we perform KEGG pathway and Gene Ontology analysis. RESULTS Following the proposed framework, we start with 2246 proteins from uniprot which are kept in Puterine. Under DGUL there are 6555 genes which are differentially methylated (p-value <0.05). Only 434 proteins selected from the intersection of DGUL and Puterine. Among them only 210 proteins are fallen PUL disordered with more than 10% structural disorder. Top ten proteins under the range of 100% to 74.2% are selected shown in the article. After performing KEGG pathway analysis and Gene Ontology analysis, it is found that Q969W3 has no connection with KEGG or GO terms. CONCLUSION After the applying the framework, we get some verified group of proteins at different stages of the proposed method. The group of 210 disordered proteins is verified from the KEGG and GO analysis. As the result is verified at satisfactory level then it can be said that the framework is successfully analyzed intrinsically disordered proteins, having a connection with differential methylation levels for a specific disease.
Collapse
Affiliation(s)
- Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata - 700032, India
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, United States.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Pushchino, Moscow region - 142290, Russian Federation
| | - Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata - 700032, India
| |
Collapse
|
30
|
Sen S, Maulik U. Recent advancement toward significant association between disordered transcripts and virus-infected diseases: a survey. Brief Funct Genomics 2018; 17:458-470. [DOI: 10.1093/bfgp/ely021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, India
| |
Collapse
|
31
|
Ray S, Maulik U. Discovering Perturbation of Modular Structure in HIV Progression by Integrating Multiple Data Sources Through Non-Negative Matrix Factorization. IEEE/ACM Trans Comput Biol Bioinform 2018; 15:869-877. [PMID: 28029629 DOI: 10.1109/tcbb.2016.2642184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Detecting perturbation in modular structure during HIV-1 disease progression is an important step to understand stage specific infection pattern of HIV-1 virus in human cell. In this article, we proposed a novel methodology on integration of multiple biological information to identify such disruption in human gene module during different stages of HIV-1 infection. We integrate three different biological information: gene expression information, protein-protein interaction information, and gene ontology information in single gene meta-module, through non negative matrix factorization (NMF). As the identified meta-modules inherit those information so, detecting perturbation of these, reflects the changes in expression pattern, in PPI structure and in functional similarity of genes during the infection progression. To integrate modules of different data sources into strong meta-modules, NMF based clustering is utilized here. Perturbation in meta-modular structure is identified by investigating the topological and intramodular properties and putting rank to those meta-modules using a rank aggregation algorithm. We have also analyzed the preservation structure of significant GO terms in which the human proteins of the meta-modules participate. Moreover, we have performed an analysis to show the change of coregulation pattern of identified transcription factors (TFs) over the HIV progression stages.
Collapse
|
32
|
Maulik U, Sen S, Mallik S, Bandyopadhyay S. Detecting TF-miRNA-gene network based modules for 5hmC and 5mC brain samples: a intra- and inter-species case-study between human and rhesus. BMC Genet 2018; 19:9. [PMID: 29357837 PMCID: PMC5776763 DOI: 10.1186/s12863-017-0574-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 11/29/2017] [Indexed: 01/09/2023] Open
Abstract
Background Study of epigenetics is currently a high-impact research topic. Multi stage methylation is also an area of high-dimensional prospect. In this article, we provide a new study (intra and inter-species study) on brain tissue between human and rhesus on two methylation cytosine variants based data-profiles (viz., 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) samples) through TF-miRNA-gene network based module detection. Results First of all, we determine differentially 5hmC methylated genes for human as well as rhesus for intra-species analysis, and differentially multi-stage methylated genes for inter-species analysis. Thereafter, we utilize weighted topological overlap matrix (TOM) measure and average linkage clustering consecutively on these genesets for intra- and inter-species study.We identify co-methylated and multi-stage co-methylated gene modules by using dynamic tree cut, for intra-and inter-species cases, respectively. Each module is represented by individual color in the dendrogram. Gene Ontology and KEGG pathway based analysis are then performed to identify biological functionalities of the identified modules. Finally, top ten regulator TFs and targeter miRNAs that are associated with the maximum number of gene modules, are determined for both intra-and inter-species analysis. Conclusions The novel TFs and miRNAs obtained from the analysis are: MYST3 and ZNF771 as TFs (for human intra-species analysis), BAZ2B, RCOR3 and ATF1 as TFs (for rhesus intra-species analysis), and mml-miR-768-3p and mml-miR-561 as miRs (for rhesus intra-species analysis); and MYST3 and ZNF771 as miRs(for inter-species study). Furthermore, the genes/TFs/miRNAs that are already found to be liable for several brain-related dreadful diseases as well as rare neglected diseases (e.g., wolf Hirschhorn syndrome, Joubarts Syndrome, Huntington’s disease, Simian Immunodeficiency Virus(SIV) mediated enchaphilits, Parkinsons Disease, Bipolar disorder and Schizophenia etc.) are mentioned. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0574-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Sagnik Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Saurav Mallik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | | |
Collapse
|
33
|
Ray S, Maulik U, Mukhopadhyay A. A review of computational approaches for analysis of hepatitis C virus-mediated liver diseases. Brief Funct Genomics 2017; 17:428-440. [DOI: 10.1093/bfgp/elx040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
| |
Collapse
|
34
|
Mitra R, Chen X, Greenawalt EJ, Maulik U, Jiang W, Zhao Z, Eischen CM. Decoding critical long non-coding RNA in ovarian cancer epithelial-to-mesenchymal transition. Nat Commun 2017; 8:1604. [PMID: 29150601 PMCID: PMC5693921 DOI: 10.1038/s41467-017-01781-0] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 10/16/2017] [Indexed: 12/17/2022] Open
Abstract
Long non-coding RNA (lncRNA) are emerging as contributors to malignancies. Little is understood about the contribution of lncRNA to epithelial-to-mesenchymal transition (EMT), which correlates with metastasis. Ovarian cancer is usually diagnosed after metastasis. Here we report an integrated analysis of >700 ovarian cancer molecular profiles, including genomic data sets, from four patient cohorts identifying lncRNA DNM3OS, MEG3, and MIAT overexpression and their reproducible gene regulation in ovarian cancer EMT. Genome-wide mapping shows 73% of MEG3-regulated EMT-linked pathway genes contain MEG3 binding sites. DNM3OS overexpression, but not MEG3 or MIAT, significantly correlates to worse overall patient survival. DNM3OS knockdown results in altered EMT-linked genes/pathways, mesenchymal-to-epithelial transition, and reduced cell migration and invasion. Proteotranscriptomic characterization further supports the DNM3OS and ovarian cancer EMT connection. TWIST1 overexpression and DNM3OS amplification provides an explanation for increased DNM3OS levels. Therefore, our results elucidate lncRNA that regulate EMT and demonstrate DNM3OS specifically contributes to EMT in ovarian cancer.
Collapse
Affiliation(s)
- Ramkrishna Mitra
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, 19107, USA
| | - Xi Chen
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, 19107, USA
| | - Evan J Greenawalt
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, 19107, USA
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Jadavpur, 700032, India
| | - Wei Jiang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Christine M Eischen
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, 19107, USA.
| |
Collapse
|
35
|
|
36
|
Abstract
Microarray analysis based on gene coexpression is widely used to investigate the coregulation pattern of a group (or cluster) of genes in a specific phenotype condition. Recent approaches go one step beyond and look for differential coexpression pattern, wherein there exists a significant difference in coexpression pattern between two phenotype conditions. These changes of coexpression patterns generally arise due to significant change in regulatory mechanism across different conditions governed by natural progression of diseases. Here we develop a novel multiobjective framework DiffCoMO, to identify differentially coexpressed modules that capture altered coexpression in gene modules across different stages of HIV-1 progression. The objectives are built to emphasize the distance between coexpression pattern of two phenotype stages. The proposed method is assessed by comparing with some state-of-the-art techniques. We show that DiffCoMO outperforms the state-of-the-art for detecting differential coexpressed modules. Moreover, we have compared the performance of all the methods using simulated data. The biological significance of the discovered modules is also investigated using GO and pathway enrichment analysis. Additionally, miRNA enrichment analysis is carried out to identify TF to miRNA and miRNA to TF connections. The gene modules discovered by DiffCoMO manifest regulation by miRNA-28, miRNA-29 and miRNA-125 families.
Collapse
Affiliation(s)
- Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata, 700156, India.
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700108, India
| |
Collapse
|
37
|
Mallik S, Bhadra T, Maulik U. Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data. IEEE Trans Nanobioscience 2017; 16:3-10. [PMID: 28092570 DOI: 10.1109/tnb.2017.2650217] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Collapse
|
38
|
Bhattacharyya S, Dutta P, Maulik U. Preface. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
39
|
|
40
|
|
41
|
Mallik S, Sen S, Maulik U. IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data. Gene 2016; 586:87-96. [DOI: 10.1016/j.gene.2016.03.056] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 12/13/2022]
|
42
|
Sriwastava BK, Basu S, Maulik U. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier. IEEE/ACM Trans Comput Biol Bioinform 2015; 12:1394-1404. [PMID: 26684462 DOI: 10.1109/tcbb.2015.2401018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Collapse
|
43
|
|
44
|
Abstract
Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.
Collapse
|
45
|
Maulik U. Meet Our Editorial Board Member:. Protein Pept Lett 2015. [DOI: 10.2174/092986652210150821170543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
46
|
Mallik S, Maulik U. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset. J Biomed Inform 2015; 57:308-19. [PMID: 26297985 DOI: 10.1016/j.jbi.2015.08.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Revised: 06/26/2015] [Accepted: 08/11/2015] [Indexed: 12/12/2022]
Abstract
Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided.
Collapse
Affiliation(s)
- Saurav Mallik
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
47
|
Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A multiobjective approach for identifying protein complexes and studying their association in multiple disorders. Algorithms Mol Biol 2015; 10:24. [PMID: 26257820 PMCID: PMC4529733 DOI: 10.1186/s13015-015-0056-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 07/28/2015] [Indexed: 11/17/2022] Open
Abstract
Background Detecting protein complexes within protein–protein interaction (PPI) networks is a major step toward the analysis of biological processes and pathways. Identification and characterization of protein complexes in PPI network is an ongoing challenge. Several high-throughput experimental techniques provide substantial number of PPIs which are widely utilized for compiling the PPI network of a species. Results Here we focus on detecting human protein complexes by developing a multiobjective framework. For this large human PPI network is partitioned into modules which serves as protein complex. For building the objective functions we have utilized topological properties of PPI network and biological properties based on Gene Ontology semantic similarity. The proposed method is compared with that of some state-of-the-art algorithms in the context of different performance metrics. For the purpose of biological validation of our predicted complexes we have also employed a Gene Ontology and pathway based analysis here. Additionally, we have performed an analysis to associate resulting protein complexes with 22 key disease classes. Two bipartite networks are created to clearly visualize the association of identified protein complexes with the disorder classes. Conclusions Here, we present the task of identifying protein complexes as a multiobjective optimization problem. Identified protein complexes are found to be associated with several disorders classes like ‘Cancer’, ‘Endocrine’ and ‘Multiple’. This analysis uncovers some new relationships between disorders and predicted complexes that may take a potential role in the prediction of multi target drugs. Electronic supplementary material The online version of this article (doi:10.1186/s13015-015-0056-2) contains supplementary material, which is available to authorized users.
Collapse
|
48
|
Saha I, Rak B, Bhowmick SS, Maulik U, Bhattacharjee D, Koch U, Lazniewski M, Plewczynski D. Binding Activity Prediction of Cyclin-Dependent Inhibitors. J Chem Inf Model 2015; 55:1469-82. [PMID: 26079845 DOI: 10.1021/ci500633c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Cyclin-Dependent Kinases (CDKs) are the core components coordinating eukaryotic cell division cycle. Generally the crystal structure of CDKs provides information on possible molecular mechanisms of ligand binding. However, reliable and robust estimation of ligand binding activity has been a challenging task in drug design. In this regard, various machine learning techniques, such as Support Vector Machine, Naive Bayesian classifier, Decision Tree, and K-Nearest Neighbor classifier, have been used. The performance of these heterogeneous classification techniques depends on proper selection of features from the data set. This fact motivated us to propose an integrated classification technique using Genetic Algorithm (GA), Rotational Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods, named as the Genetic Algorithm integrated Rotational Ensemble based classification technique, for the prediction of ligand binding activity of CDKs. This technique can automatically find the important features and the ensemble size. For this purpose, GA encodes the features and ensemble size in a chromosome as a binary string. Such encoded features are then used to create diverse sets of training points using RFS in order to train the machine learning method multiple times. The RFS scheme works on Principal Component Analysis (PCA) to preserve the variability information of the rotational nonoverlapping subsets of original data. Thereafter, the testing points are fed to the different instances of trained machine learning method in order to produce the ensemble result. Here accuracy is computed as a final result after 10-fold cross validation, which also used as an objective function for GA to maximize. The effectiveness of the proposed classification technique has been demonstrated quantitatively and visually in comparison with different machine learning methods for 16 ligand binding CDK docking and rescoring data sets. In addition, the best possible features have been reported for CDK docking and rescoring data sets separately. Finally, the Friedman test has been conducted to judge the statistical significance of the results produced by the proposed technique. The results indicate that the integrated classification technique has high relevance in predicting of protein-ligand binding activity.
Collapse
Affiliation(s)
- Indrajit Saha
- †Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland.,‡Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy.,§Institute of Computer Science, University of Wroclaw, 50-383 Wroclaw, Poland
| | - Benedykt Rak
- †Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Shib Sankar Bhowmick
- ∥Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India.,⊥Department of Informatics, University of Evora, Evora 7004-516, Portugal
| | - Ujjwal Maulik
- ∥Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India
| | - Debotosh Bhattacharjee
- ∥Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India
| | - Uwe Koch
- □Lead Discovery Center, Emil-Figge-Strasse 76a, 44227 Dortmund, Germany
| | - Michal Lazniewski
- †Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- †Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland.,△The Jackson Laboratory for Genomic Medicine, c/o University of Connecticut Health Center, Administrative Services Building-Call Box 901, 263 Farmington Avenue, Farmington, Connecticut 06030, United States.,¶Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
49
|
|
50
|
|