Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020;21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open

For:	Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020;21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open

Number

Cited by Other Article(s)

Ding Y, Jiang X, Wu J, Wang Y, Zhao L, Pan Y, Xi Y, Zhao G, Li Z, Zhang L. Synergistic horizontal transfer of antibiotic resistance genes and transposons in the infant gut microbial genome. mSphere 2024;9:e0060823. [PMID: 38112433 PMCID: PMC10826358 DOI: 10.1128/msphere.00608-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023] Open

Abstract

Transposons, plasmids, bacteriophages, and other mobile genetic elements facilitate horizontal gene transfer in the gut microbiota, allowing some pathogenic bacteria to acquire antibiotic resistance genes (ARGs). Currently, the relationship between specific ARGs and specific transposons in the comprehensive infant gut microbiome has not been elucidated. In this study, ARGs and transposons were annotated from the Unified Human Gastrointestinal Genome (UHGG) and the Early-Life Gut Genomes (ELGG). Association rules mining was used to explore the association between specific ARGs and specific transposons in UHGG, and the robustness of the association rules was validated using the external database in ELGG. Our results suggested that ARGs and transposons were more likely to be relevant in infant gut microbiota compared to adult gut microbiota, and nine robust association rules were identified, among which Klebsiella pneumoniae, Enterobacter hormaechei_A, and Escherichia coli_D played important roles in this association phenomenon. The emphasis of this study is to investigate the synergistic transfer of specific ARGs and specific transposons in the infant gut microbiota, which can contribute to the study of microbial pathogenesis and the ARG dissemination dynamics.IMPORTANCEThe transfer of transposons carrying antibiotic resistance genes (ARGs) among microorganisms accelerates antibiotic resistance dissemination among infant gut microbiota. Nonetheless, it is unclear what the relationship between specific ARGs and specific transposons within the infant gut microbiota. K. pneumoniae, E. hormaechei_A, and E. coli_D were identified as key players in the nine robust association rules we discovered. Meanwhile, we found that infant gut microorganisms were more susceptible to horizontal gene transfer events about specific ARGs and specific transposons than adult gut microorganisms. These discoveries could enhance the understanding of microbial pathogenesis and the ARG dissemination dynamics within the infant gut microbiota.

Collapse

Mallick K, Chakraborty S, Mallik S, Bandyopadhyay S. A scalable unsupervised learning of scRNAseq data detects rare cells through integration of structure-preserving embedding, clustering and outlier detection. Brief Bioinform 2023;24:bbad125. [PMID: 37185897 DOI: 10.1093/bib/bbad125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 02/06/2023] [Accepted: 02/24/2023] [Indexed: 05/17/2023] Open

Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023;15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open

Li A, Xiong S, Li J, Mallik S, Liu Y, Fei R, Zhou H, Liu G. AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1574-1580. [PMID: 35853049 DOI: 10.1109/tcbb.2022.3192306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Mallik S, Sarkar A, Nath S, Maulik U, Das S, Pati SK, Ghosh S, Zhao Z. 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection. Front Genet 2023;14:1095330. [PMID: 36865387 PMCID: PMC9971618 DOI: 10.3389/fgene.2023.1095330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/30/2023] [Indexed: 02/16/2023] Open

Abstract

In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.

Collapse

Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023;13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open

Wei Y, Li L, Zhao X, Yang H, Sa J, Cao H, Cui Y. Cancer subtyping with heterogeneous multi-omics data via hierarchical multi-kernel learning. Brief Bioinform 2023;24:6847203. [PMID: 36433785 DOI: 10.1093/bib/bbac488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 09/14/2022] [Accepted: 10/15/2022] [Indexed: 11/27/2022] Open

Designing optimal convolutional neural network architecture using differential evolution algorithm. PATTERNS 2022;3:100567. [PMID: 36124301 PMCID: PMC9481963 DOI: 10.1016/j.patter.2022.100567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/04/2022] [Accepted: 07/13/2022] [Indexed: 01/08/2023]

Abstract

Convolutional neural networks (CNNs) are deep learning models used widely for solving various tasks like computer vision and speech recognition. CNNs are developed manually based on problem-specific domain knowledge and tricky settings, which are laborious, time consuming, and challenging. To solve these, our study develops an improved differential evolution of convolutional neural network (IDECNN) algorithm to design CNN layer architectures for image classification. Variable-length encoding is utilized to represent the flexible layer architecture of a CNN model in IDECNN. An efficient heuristic mechanism is proposed in IDECNN to evolve CNN architecture through mutation and crossover to prevent premature convergence during the evolutionary process. Eight well-known imaging datasets were utilized. The results showed that IDECNN could design suitable architecture compared with 20 existing CNN models. Finally, CNN architectures are applied to pneumonia and coronavirus disease 2019 (COVID-19) X-ray biomedical image data. The results demonstrated the usefulness of the proposed approach to generate a suitable CNN model.

•

Introduce DE algorithm to automatically design CNN architectures

•

Variable-length encoding strategy is proposed to encode each CNN model

•

For the DE framework, two CNN architectures undergo a refinement difference approach

•

Design a heuristic mechanism for mutation operation to evolve CNN architectures

Convolutional neural networks (CNNs) are a class of deep learning (DL) methods that have demonstrated improved performance in various computer vision tasks. With the growing popularity of CNNs, several CNN architectures have been introduced with a large number of design options that are problem dependent. In most situations, the constructed CNN model performs well on the dataset used to train it. There is no guarantee that the designed CNN model can achieve sufficient classification accuracy for other datasets. Designing an appropriate CNN model architecture for a particular problem requires human interaction and trial-and-error procedures, which are laborious and time consuming. This study uses an improved differential evolution of convolutional neural network (IDECNN) technique to automatically construct effective CNN architectures for several image classification problems, which mitigates the issues found with manually designed CNN models.

Collapse

Xi E, Bai J, Zhang K, Yu H, Guo Y. Genomic variants disrupt miRNA-mRNA regulation. Chem Biodivers 2022;19:e202200623. [PMID: 35985010 DOI: 10.1002/cbdv.202200623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/17/2022] [Indexed: 11/09/2022]

Hu R, Zhou XJ, Li W. Computational Analysis of High-Dimensional DNA Methylation Data for Cancer Prognosis. J Comput Biol 2022;29:769-781. [PMID: 35671506 PMCID: PMC9419965 DOI: 10.1089/cmb.2022.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Dhar R, Mallik S, Devi A. Exosomal microRNAs (exoMIRs): micromolecules with macro impact in oral cancer. 3 Biotech 2022;12:155. [DOI: 10.1007/s13205-022-03217-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 05/31/2022] [Indexed: 12/16/2022] Open

Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences. MATHEMATICS 2022. [DOI: 10.3390/math10132228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Bhadra T, Mallik S, Hasan N, Zhao Z. Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinformatics 2022;23:153. [PMID: 35484501 PMCID: PMC9052461 DOI: 10.1186/s12859-022-04678-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 04/11/2022] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

As many complex omics data have been generated during the last two decades, dimensionality reduction problem has been a challenging issue in better mining such data. The omics data typically consists of many features. Accordingly, many feature selection algorithms have been developed. The performance of those feature selection methods often varies by specific data, making the discovery and interpretation of results challenging.

METHODS AND RESULTS

In this study, we performed a comprehensive comparative study of five widely used supervised feature selection methods (mRMR, INMIFS, DFS, SVM-RFE-CBR and VWMRmR) for multi-omics datasets. Specifically, we used five representative datasets: gene expression (Exp), exon expression (ExpExon), DNA methylation (hMethyl27), copy number variation (Gistic2), and pathway activity dataset (Paradigm IPLs) from a multi-omics study of acute myeloid leukemia (LAML) from The Cancer Genome Atlas (TCGA). The different feature subsets selected by the aforesaid five different feature selection algorithms are assessed using three evaluation criteria: (1) classification accuracy (Acc), (2) representation entropy (RE) and (3) redundancy rate (RR). Four different classifiers, viz., C4.5, NaiveBayes, KNN, and AdaBoost, were used to measure the classification accuary (Acc) for each selected feature subset. The VWMRmR algorithm obtains the best Acc for three datasets (ExpExon, hMethyl27 and Paradigm IPLs). The VWMRmR algorithm offers the best RR (obtained using normalized mutual information) for three datasets (Exp, Gistic2 and Paradigm IPLs), while it gives the best RR (obtained using Pearson correlation coefficient) for two datasets (Gistic2 and Paradigm IPLs). It also obtains the best RE for three datasets (Exp, Gistic2 and Paradigm IPLs). Overall, the VWMRmR algorithm yields best performance for all three evaluation criteria for majority of the datasets. In addition, we identified signature genes using supervised learning collected from the overlapped top feature set among five feature selection methods. We obtained a 7-gene signature (ZMIZ1, ENG, FGFR1, PAWR, KRT17, MPO and LAT2) for EXP, a 9-gene signature for ExpExon, a 7-gene signature for hMethyl27, one single-gene signature (PIK3CG) for Gistic2 and a 3-gene signature for Paradigm IPLs.

CONCLUSION

We performed a comprehensive comparison of the performance evaluation of five well-known feature selection methods for mining features from various high-dimensional datasets. We identified signature genes using supervised learning for the specific omic data for the disease. The study will help incorporate higher order dependencies among features.

Collapse

Munquad S, Si T, Mallik S, Das AB, Zhao Z. A Deep Learning-Based Framework for Supporting Clinical Diagnosis of Glioblastoma Subtypes. Front Genet 2022;13:855420. [PMID: 35419027 PMCID: PMC9000988 DOI: 10.3389/fgene.2022.855420] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/17/2022] [Indexed: 12/12/2022] Open

Serra A, Saarimäki LA, Pavel A, del Giudice G, Fratello M, Cattelani L, Federico A, Laurino O, Marwah VS, Fortino V, Scala G, Sofia Kinaret PA, Greco D. Nextcast: a software suite to analyse and model toxicogenomics data. Comput Struct Biotechnol J 2022;20:1413-1426. [PMID: 35386103 PMCID: PMC8956870 DOI: 10.1016/j.csbj.2022.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 03/16/2022] [Accepted: 03/16/2022] [Indexed: 11/28/2022] Open

Affiliation(s)

Angela Serra Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Laura Aliisa Saarimäki Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Alisa Pavel Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Giusy del Giudice Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Michele Fratello Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Luca Cattelani Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Antonio Federico Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
Omar Laurino Freelance developer, Boston, USA
Veer Singh Marwah Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland
Vittorio Fortino Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
Giovanni Scala Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Department of Biology, University of Naples Federico II, Naples, Italy
Pia Anneli Sofia Kinaret Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Dario Greco Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland BioMediTech Institute, Tampere University, Tampere University, Tampere, Finland Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland Institute of Biotechnology, University of Helsinki, Helsinki, Finland Corresponding author.

Collapse

Uzunangelov V, Wong CK, Stuart JM. Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge. PLoS Comput Biol 2021;17:e1008878. [PMID: 33861732 PMCID: PMC8081343 DOI: 10.1371/journal.pcbi.1008878] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 04/28/2021] [Accepted: 03/15/2021] [Indexed: 02/03/2023] Open

Bora K, Bhuyan MK, Kasugai K, Mallik S, Zhao Z. Computational learning of features for automated colonic polyp classification. Sci Rep 2021;11:4347. [PMID: 33623086 PMCID: PMC7902635 DOI: 10.1038/s41598-021-83788-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 02/04/2021] [Indexed: 12/24/2022] Open

Mandal M, Sahoo SK, Patra P, Mallik S, Zhao Z. In silico ranking of phenolics for therapeutic effectiveness on cancer stem cells. BMC Bioinformatics 2020;21:499. [PMID: 33371879 PMCID: PMC7768647 DOI: 10.1186/s12859-020-03849-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 10/27/2020] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Cancer stem cells (CSCs) have features such as the ability to self-renew, differentiate into defined progenies and initiate the tumor growth. Treatments of cancer include drugs, chemotherapy and radiotherapy or a combination. However, treatment of cancer by various therapeutic strategies often fail. One possible reason is that the nature of CSCs, which has stem-like properties, make it more dynamic and complex and may cause the therapeutic resistance. Another limitation is the side effects associated with the treatment of chemotherapy or radiotherapy. To explore better or alternative treatment options the current study aims to investigate the natural drug-like molecules that can be used as CSC-targeted therapy. Among various natural products, anticancer potential of phenolics is well established. We collected the 21 phytochemicals from phenolic group and their interacting CSC genes from the publicly available databases. Then a bipartite graph is constructed from the collected CSC genes along with their interacting phytochemicals from phenolic group as other. The bipartite graph is then transformed into weighted bipartite graph by considering the interaction strength between the phenolics and the CSC genes. The CSC genes are also weighted by two scores, namely, DSI (Disease Specificity Index) and DPI (Disease Pleiotropy Index). For each gene, its DSI score reflects the specific relationship with the disease and DPI score reflects the association with multiple diseases. Finally, a ranking technique is developed based on PageRank (PR) algorithm for ranking the phenolics.

RESULTS

We collected 21 phytochemicals from phenolic group and 1118 CSC genes. The top ranked phenolics were evaluated by their molecular and pharmacokinetics properties and disease association networks. We selected top five ranked phenolics (Resveratrol, Curcumin, Quercetin, Epigallocatechin Gallate, and Genistein) for further examination of their oral bioavailability through molecular properties, drug likeness through pharmacokinetic properties, and associated network with CSC genes.

CONCLUSION

Our PR ranking based approach is useful to rank the phenolics that are associated with CSC genes. Our results suggested some phenolics are potential molecules for CSC-related cancer treatment.

Collapse

Kabeshova A, Yu Y, Lukacs B, Bacry E, Gaïffas S. ZiMM: A deep learning model for long term and blurry relapses with non-clinical claims data. J Biomed Inform 2020;110:103531. [PMID: 32818667 DOI: 10.1016/j.jbi.2020.103531] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 07/25/2020] [Accepted: 08/09/2020] [Indexed: 11/28/2022]

Abstract

This paper considers the problems of modeling and predicting a long-term and "blurry" relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a "blurry" fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a "non-clinical" claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.

Collapse

A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data. Genes (Basel) 2020;11:genes11080931. [PMID: 32806782 PMCID: PMC7465138 DOI: 10.3390/genes11080931] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/03/2020] [Accepted: 08/06/2020] [Indexed: 12/12/2022] Open

Abstract

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, "nnet" to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.

Collapse

Mallik S, Qin G, Jia P, Zhao Z. Molecular signatures identified by integrating gene expression and methylation in non-seminoma and seminoma of testicular germ cell tumours. Epigenetics 2020;16:162-176. [PMID: 32615059 DOI: 10.1080/15592294.2020.1790108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Nakhl S, Sleilaty G, Chouery E, Salem N, Chahine R, Farès N. FokI vitamin D receptor gene polymorphism and serum 25-hydroxyvitamin D in patients with cardiovascular risk. Arch Med Sci Atheroscler Dis 2019;4:e298-e303. [PMID: 32368685 PMCID: PMC7191939 DOI: 10.5114/amsad.2019.91437] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 11/19/2019] [Indexed: 01/02/2023] Open

Mallik S, Zhao Z. Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles. Genes (Basel) 2019;10:E611. [PMID: 31412637 PMCID: PMC6723724 DOI: 10.3390/genes10080611] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/30/2019] [Accepted: 08/07/2019] [Indexed: 02/06/2023] Open

Abstract

Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.

Collapse