1
|
Todorova VK, Bauer MA, Azhar G, Wei JY. RNA sequencing of formalin fixed paraffin-embedded heart tissue provides transcriptomic information about chemotherapy-induced cardiotoxicity. Pathol Res Pract 2024; 257:155309. [PMID: 38678848 DOI: 10.1016/j.prp.2024.155309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 04/11/2024] [Indexed: 05/01/2024]
Abstract
Gene expression of formalin-fixed paraffin-embedded (FFPE) tissue may serve for molecular studies on cardiovascular diseases. Chemotherapeutics, such as doxorubicin (DOX) may cause heart injury, but the mechanisms of these side effects of DOX are not well understood. This study aimed to investigate whether DOX-induced gene expression in archival FFPE heart tissue in experimental rats would correlate with the gene expression in fresh-frozen heart tissue by applying RNA sequencing technology. The results showed RNA from FFPE samples was degraded, resulting in a lower number of uniquely mapped reads. However, DOX-induced differentially expressed genes in FFPE were related to molecular mechanisms of DOX-induced cardiotoxicity, such as inflammation, calcium binding, endothelial dysfunction, senescence, and cardiac hypertrophy signaling. Our data suggest that, despite the limitations, RNA sequencing of archival FFPE heart tissue supports utilizing FFPE tissues from retrospective studies on cardiovascular disorders, including DOX-induced cardiotoxicity.
Collapse
Affiliation(s)
- Valentina K Todorova
- Division of Hematology/Oncology, University of Arkansas for Medical Sciences, Little Rock, AR, USA; Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
| | - Michael A Bauer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Gohar Azhar
- Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Jeanne Y Wei
- Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
2
|
Li Z, Song C, Yang J, Jia Z, Chen D, Yan C, Tian L, Wu X. Clustering algorithm based on DINNSM and its application in gene expression data analysis. Technol Health Care 2024; 32:229-239. [PMID: 38759052 PMCID: PMC11191479 DOI: 10.3233/thc-248020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
BACKGROUND Selecting an appropriate similarity measurement method is crucial for obtaining biologically meaningful clustering modules. Commonly used measurement methods are insufficient in capturing the complexity of biological systems and fail to accurately represent their intricate interactions. OBJECTIVE This study aimed to obtain biologically meaningful gene modules by using the clustering algorithm based on a similarity measurement method. METHODS A new algorithm called the Dual-Index Nearest Neighbor Similarity Measure (DINNSM) was proposed. This algorithm calculated the similarity matrix between genes using Pearson's or Spearman's correlation. It was then used to construct a nearest-neighbor table based on the similarity matrix. The final similarity matrix was reconstructed using the positions of shared genes in the nearest neighbor table and the number of shared genes. RESULTS Experiments were conducted on five different gene expression datasets and compared with five widely used similarity measurement techniques for gene expression data. The findings demonstrate that when utilizing DINNSM as the similarity measure, the clustering results performed better than using alternative measurement techniques. CONCLUSIONS DINNSM provided more accurate insights into the intricate biological connections among genes, facilitating the identification of more accurate and biological gene co-expression modules.
Collapse
Affiliation(s)
- Zongjin Li
- Department of Computer, Qinghai Normal University, Xining, China
| | - Changxin Song
- Department of Mechanical Engineering and Information, Shanghai Urban Construction Vocational College, Shanghai, China
| | - Jiyu Yang
- Department of Cardiovascular Medicine, Xining First People’s Hospital, Xining, China
| | - Zeyu Jia
- Department of Computer, Qinghai Normal University, Xining, China
| | - Dongzhen Chen
- School of Materials Science and Engineering, Xi’an Polytechnic University, Xi’an, China
| | - Chengying Yan
- Department of Cardiovascular Medicine, Xining First People’s Hospital, Xining, China
| | - Liqin Tian
- Department of Computer, Qinghai Normal University, Xining, China
- School of Computer, North China Institute of Science and Technology, Langfang, China
| | - Xiaoming Wu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
3
|
Javed A, Rizzo DM, Lee BS, Gramling R. Somtimes: self organizing maps for time series clustering and its application to serious illness conversations. Data Min Knowl Discov 2023; 38:813-839. [PMID: 38711534 PMCID: PMC11069464 DOI: 10.1007/s10618-023-00979-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 08/22/2023] [Indexed: 05/08/2024]
Abstract
There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM's training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8× speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1× and 18× depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm's utility with complex, temporally sequenced natural language. Supplementary Information The online version contains supplementary material available at 10.1007/s10618-023-00979-9.
Collapse
Affiliation(s)
- Ali Javed
- Department of Medicine, Stanford University, 300 Pasteur Dr, Stanford, CA 94305 USA
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Donna M. Rizzo
- Department of Civil and Environmental Engineering, University of Vermont, Burlington, VT USA
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Byung Suk Lee
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Robert Gramling
- Department of Family Medicine, University of Vermont, Burlington, VT USA
| |
Collapse
|
4
|
Jawaharraj K, Peta V, Dhiman SS, Gnimpieba EZ, Gadhamshetty V. Transcriptome-wide marker gene expression analysis of stress-responsive sulfate-reducing bacteria. Sci Rep 2023; 13:16181. [PMID: 37758719 PMCID: PMC10533852 DOI: 10.1038/s41598-023-43089-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023] Open
Abstract
Sulfate-reducing bacteria (SRB) are terminal members of any anaerobic food chain. For example, they critically influence the biogeochemical cycling of carbon, nitrogen, sulfur, and metals (natural environment) as well as the corrosion of civil infrastructure (built environment). The United States alone spends nearly $4 billion to address the biocorrosion challenges of SRB. It is important to analyze the genetic mechanisms of these organisms under environmental stresses. The current study uses complementary methodologies, viz., transcriptome-wide marker gene panel mapping and gene clustering analysis to decipher the stress mechanisms in four SRB. Here, the accessible RNA-sequencing data from the public domains were mined to identify the key transcriptional signatures. Crucial transcriptional candidate genes of Desulfovibrio spp. were accomplished and validated the gene cluster prediction. In addition, the unique transcriptional signatures of Oleidesulfovibrio alaskensis (OA-G20) at graphene and copper interfaces were discussed using in-house RNA-sequencing data. Furthermore, the comparative genomic analysis revealed 12,821 genes with translation, among which 10,178 genes were in homolog families and 2643 genes were in singleton families were observed among the 4 genomes studied. The current study paves a path for developing predictive deep learning tools for interpretable and mechanistic learning analysis of the SRB gene regulation.
Collapse
Affiliation(s)
- Kalimuthu Jawaharraj
- Civil and Environmental Engineering, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
- 2D-Materials for Biofilm Engineering, Science and Technology (2D BEST) Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
- Data-Driven Materials Discovery for Bioengineering Innovation Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
| | - Vincent Peta
- Biomedical Engineering, University of South Dakota, 4800 N Career Ave, Sioux Falls, SD, 57107, USA
| | - Saurabh Sudha Dhiman
- Civil and Environmental Engineering, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
- Data-Driven Materials Discovery for Bioengineering Innovation Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
- Chemistry, Biology and Health Sciences, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA
| | - Etienne Z Gnimpieba
- 2D-Materials for Biofilm Engineering, Science and Technology (2D BEST) Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA.
- Data-Driven Materials Discovery for Bioengineering Innovation Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA.
- Biomedical Engineering, University of South Dakota, 4800 N Career Ave, Sioux Falls, SD, 57107, USA.
| | - Venkataramana Gadhamshetty
- Civil and Environmental Engineering, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA.
- 2D-Materials for Biofilm Engineering, Science and Technology (2D BEST) Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA.
- Data-Driven Materials Discovery for Bioengineering Innovation Center, South Dakota Mines, 501 E. St. Joseph Street, Rapid City, SD, 57701, USA.
| |
Collapse
|
5
|
Todorova VK, Byrum SD, Mackintosh SG, Jamshidi-Parsian A, Gies AJ, Washam CL, Jenkins SV, Spiva T, Bowman E, Reyna NS, Griffin RJ, Makhoul I. Exosomal MicroRNA and Protein Profiles of Hepatitis B Virus-Related Hepatocellular Carcinoma Cells. Int J Mol Sci 2023; 24:13098. [PMID: 37685904 PMCID: PMC10487651 DOI: 10.3390/ijms241713098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 09/10/2023] Open
Abstract
Infection with hepatitis B virus (HBV) is a main risk factor for hepatocellular carcinoma (HCC). Extracellular vesicles, such as exosomes, play an important role in tumor development and metastasis, including regulation of HBV-related HCC. In this study, we have characterized exosome microRNA and proteins released in vitro from hepatitis B virus (HBV)-related HCC cell lines SNU-423 and SNU-182 and immortalized normal hepatocyte cell lines (THLE2 and THLE3) using microRNA sequencing and mass spectrometry. Bioinformatics, including functional enrichment and network analysis, combined with survival analysis using data related to HCC in The Cancer Genome Atlas (TCGA) database, were applied to examine the prognostic significance of the results. More than 40 microRNAs and 200 proteins were significantly dysregulated (p < 0.05) in the exosomes released from HCC cells in comparison with the normal liver cells. The functional analysis of the differentially expressed exosomal miRNAs (i.e., mir-483, mir-133a, mir-34a, mir-155, mir-183, mir-182), their predicted targets, and exosomal differentially expressed proteins (i.e., POSTN, STAM, EXOC8, SNX9, COL1A2, IDH1, FN1) showed correlation with pathways associated with HBV, virus activity and invasion, exosome formation and adhesion, and exogenous protein binding. The results from this study may help in our understanding of the role of HBV infection in the development of HCC and in the development of new targets for treatment or non-invasive predictive biomarkers of HCC.
Collapse
Affiliation(s)
- Valentina K. Todorova
- Department of Internal Medicine/Division of Hematology/Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Stephanie D. Byrum
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (S.G.M.); (A.J.G.); (C.L.W.)
| | - Samuel G. Mackintosh
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (S.G.M.); (A.J.G.); (C.L.W.)
| | - Azemat Jamshidi-Parsian
- Department of Radiation Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (A.J.-P.); (S.V.J.); (R.J.G.)
| | - Allen J. Gies
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (S.G.M.); (A.J.G.); (C.L.W.)
| | - Charity L. Washam
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (S.G.M.); (A.J.G.); (C.L.W.)
| | - Samir V. Jenkins
- Department of Radiation Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (A.J.-P.); (S.V.J.); (R.J.G.)
| | - Timothy Spiva
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (T.S.); (E.B.); (N.S.R.)
| | - Emily Bowman
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (T.S.); (E.B.); (N.S.R.)
| | - Nathan S. Reyna
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (T.S.); (E.B.); (N.S.R.)
| | - Robert J. Griffin
- Department of Radiation Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (A.J.-P.); (S.V.J.); (R.J.G.)
| | - Issam Makhoul
- Department of Internal Medicine/Division of Hematology/Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| |
Collapse
|
6
|
Gene expression prediction based on neighbour connection neural network utilizing gene interaction graphs. PLoS One 2023; 18:e0281286. [PMID: 36745614 PMCID: PMC9901809 DOI: 10.1371/journal.pone.0281286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/19/2023] [Indexed: 02/07/2023] Open
Abstract
Having observed that gene expressions have a correlation, the Library of Integrated Network-based Cell-Signature program selects 1000 landmark genes to predict the remaining gene expression value. Further works have improved the prediction result by using deep learning models. However, these models ignore the latent structure of genes, limiting the accuracy of the experimental results. We therefore propose a novel neural network named Neighbour Connection Neural Network(NCNN) to utilize the gene interaction graph information. Comparing to the popular GCN model, our model incorperates the graph information in a better manner. We validate our model under two different settings and show that our model promotes prediction accuracy comparing to the other models.
Collapse
|
7
|
Sekaran K, Polachirakkal Varghese R, Gnanasambandan R, Karthik G, Ramya I, George Priya Doss C. Molecular modeling of C1-inhibitor as SARS-CoV-2 target identified from the immune signatures of multiple tissues: An integrated bioinformatics study. Cell Biochem Funct 2023; 41:112-127. [PMID: 36517964 DOI: 10.1002/cbf.3769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/02/2022] [Accepted: 11/27/2022] [Indexed: 12/16/2022]
Abstract
The expeditious transmission of the severe acute respiratory coronavirus 2 (SARS-CoV-2), a strain of COVID-19, crumbled the global economic strength and caused a veritable collapse in health infrastructure. The molecular modeling of the novel coronavirus research sounds promising and equips more evidence about the pragmatic therapeutic options. This article proposes a machine-learning framework for identifying potential COVID-19 transcriptomic signatures. The transcriptomics data contains immune-related genes collected from multiple tissues (blood, nasal, and buccal) with accession number: GSE183071. Extensive bioinformatics work was carried out to identify the potential candidate markers, including differential expression analysis, protein interactions, gene ontology, and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment studies. The overlapping investigation found SERPING1, the gene that encodes a glycosylated plasma protein C1-INH, in all three datasets. Furthermore, the immuno-informatics study was conducted on the C1-INH protein. 5DU3, the protein identifier of C1-INH, was fetched to identify the antigenicity, major histocompatibility (MHC) Class I and II binding epitopes, allergenicity, toxicity, and immunogenicity. The screening of peptides satisfying the vaccine-design criteria based on the metrics mentioned above is performed. The drug-gene interaction study reported that Rhucin is strongly associated with SERPING1. HSIC-Lasso (Hilbert-Schmidt independence criterion-least absolute shrinkage and selection operator), a model-free biomarker selection technique, was employed to identify the genes having a nonlinear relationship with the target class. The gene subset is trained with supervised machine learning models by a leave-one-out cross-validation method. Explainable artificial intelligence techniques perform the model interpretation analysis.
Collapse
Affiliation(s)
- Karthik Sekaran
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | | | - R Gnanasambandan
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - G Karthik
- Department of Medicine, Christian Medical College, Vellore, India
| | - I Ramya
- Department of Medicine, Christian Medical College, Vellore, India
| | - C George Priya Doss
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
8
|
Jiang W, Joehanes R, Levy D, O’Connor GT, Dupuis J. Assisted clustering of gene expression data using regulatory data from partially overlapping sets of individuals. BMC Genomics 2022; 23:819. [PMID: 36496393 PMCID: PMC9734806 DOI: 10.1186/s12864-022-09026-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/18/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND As omics measurements profiled on different molecular layers are interconnected, integrative approaches that incorporate the regulatory effect from multi-level omics data are needed. When the multi-level omics data are from the same individuals, gene expression (GE) clusters can be identified using information from regulators like genetic variants and DNA methylation. When the multi-level omics data are from different individuals, the choice of integration approaches is limited. METHODS We developed an approach to improve GE clustering from microarray data by integrating regulatory data from different but partially overlapping sets of individuals. We achieve this through (1) decomposing gene expression into the regulated component and the other component that is not regulated by measured factors, (2) optimizing the clustering goodness-of-fit objective function. We do not require the availability of different omics measurements on all individuals. A certain amount of individual overlap between GE data and the regulatory data is adequate for modeling the regulation, thus improving GE clustering. RESULTS A simulation study shows that the performance of the proposed approach depends on the strength of the GE-regulator relationship, degree of missingness, data dimensionality, sample size, and the number of clusters. Across the various simulation settings, the proposed method shows competitive performance in terms of accuracy compared to the alternative K-means clustering method, especially when the clustering structure is due mostly to the regulated component, rather than the unregulated component. We further validate the approach with an application to 8,902 Framingham Heart Study participants with data on up to 17,873 genes and regulation information of DNA methylation and genotype from different but partially overlapping sets of participants. We identify clustering structures of genes associated with pulmonary function while incorporating the predicted regulation effect from the measured regulators. We further investigate the over-representation of these GE clusters in pathways of other diseases that may be related to lung function and respiratory health. CONCLUSION We propose a novel approach for clustering GE with the assistance of regulatory data that allowed for different but partially overlapping sets of individuals to be included in different omics data.
Collapse
Affiliation(s)
- Wenqing Jiang
- grid.189504.10000 0004 1936 7558Department of Biostatistics, Boston University School of Public Health, MA Boston, USA
| | - Roby Joehanes
- grid.510954.c0000 0004 0444 3861National Heart, Lung, and Blood Institute’s Framingham Heart Study, MA Framingham, USA
| | - Daniel Levy
- grid.510954.c0000 0004 0444 3861National Heart, Lung, and Blood Institute’s Framingham Heart Study, MA Framingham, USA ,grid.94365.3d0000 0001 2297 5165The Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, MD Bethesda, USA
| | - George T O’Connor
- grid.189504.10000 0004 1936 7558Department of Medicine, Pulmonary Center, Boston University, MA Boston, USA
| | - Josée Dupuis
- grid.189504.10000 0004 1936 7558Department of Biostatistics, Boston University School of Public Health, MA Boston, USA
| |
Collapse
|
9
|
Hu H, Sotirov R, Wolkowicz H. Facial reduction for symmetry reduced semidefinite and doubly nonnegative programs. MATHEMATICAL PROGRAMMING 2022; 200:475-529. [PMID: 37215307 PMCID: PMC10195748 DOI: 10.1007/s10107-022-01890-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 09/04/2022] [Indexed: 05/24/2023]
Abstract
We consider both facial reduction, FR, and symmetry reduction, SR, techniques for semidefinite programming, SDP. We show that the two together fit surprisingly well in an alternating direction method of multipliers, ADMM, approach. In fact, this approach allows for simply adding on nonnegativity constraints, and solving the doubly nonnegative, DNN , relaxation of many classes of hard combinatorial problems. We also show that the singularity degree remains the same after SR, and that the DNN relaxations considered here have singularity degree one, that is reduced to zero after FR. The combination of FR and SR leads to a significant improvement in both numerical stability and running time for both the ADMM and interior point approaches. We test our method on various DNN relaxations of hard combinatorial problems including quadratic assignment problems with sizes of more than n=500. This translates to a semidefinite constraint of order 250, 000 and 625×108 nonnegative constrained variables, before applying the reduction techniques.
Collapse
Affiliation(s)
- Hao Hu
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina 29634 USA
| | - Renata Sotirov
- Department of Econometrics and Operations Research, Tilburg University, 5000 Tilburg, LE The Netherlands
| | - Henry Wolkowicz
- Department of Combinatorics and Optimization Faculty of Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| |
Collapse
|
10
|
An analysis framework for clustering algorithm selection with applications to spectroscopy. PLoS One 2022; 17:e0266369. [PMID: 35358292 PMCID: PMC8970496 DOI: 10.1371/journal.pone.0266369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 03/19/2022] [Indexed: 11/19/2022] Open
Abstract
Cluster analysis is a valuable unsupervised machine learning technique that is applied in a multitude of domains to identify similarities or clusters in unlabelled data. However, its performance is dependent of the characteristics of the data it is being applied to. There is no universally best clustering algorithm, and hence, there are numerous clustering algorithms available with different performance characteristics. This raises the problem of how to select an appropriate clustering algorithm for the given analytical purposes. We present and validate an analysis framework to address this problem. Unlike most current literature which focuses on characterizing the clustering algorithm itself, we present a wider holistic approach, with a focus on the user’s needs, the data’s characteristics and the characteristics of the clusters it may contain. In our analysis framework, we utilize a softer qualitative approach to identify appropriate characteristics for consideration when matching clustering algorithms to the intended application. These are used to generate a small subset of suitable clustering algorithms whose performance are then evaluated utilizing quantitative cluster validity indices. To validate our analysis framework for selecting clustering algorithms, we applied it to four different types of datasets: three datasets of homemade explosives spectroscopy, eight datasets of publicly available spectroscopy data covering food and biomedical applications, a gene expression cancer dataset, and three classic machine learning datasets. Each data type has discernible differences in the composition of the data and the context within which they are used. Our analysis framework, when applied to each of these challenges, recommended differing subsets of clustering algorithms for final quantitative performance evaluation. For each application, the recommended clustering algorithms were confirmed to contain the top performing algorithms through quantitative performance indices.
Collapse
|
11
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
12
|
Todorova VK, Byrum SD, Gies AJ, Haynie C, Smith H, Reyna NS, Makhoul I. Circulating Exosomal microRNAs as Predictive Biomarkers of Neoadjuvant Chemotherapy Response in Breast Cancer. Curr Oncol 2022; 29:613-630. [PMID: 35200555 PMCID: PMC8870357 DOI: 10.3390/curroncol29020055] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/17/2022] [Accepted: 01/24/2022] [Indexed: 12/24/2022] Open
Abstract
Background: Neoadjuvant chemotherapy (NACT) is an increasingly used approach for treatment of breast cancer. The pathological complete response (pCR) is considered a good predictor of disease-specific survival. This study investigated whether circulating exosomal microRNAs could predict pCR in breast cancer patients treated with NACT. Method: Plasma samples of 20 breast cancer patients treated with NACT were collected prior to and after the first cycle. RNA sequencing was used to determine microRNA profiling. The Cancer Genome Atlas (TCGA) was used to explore the expression patterns and survivability of the candidate miRNAs, and their potential targets based on the expression levels and copy number variation (CNV) data. Results: Three miRNAs before that NACT (miR-30b, miR-328 and miR-423) predicted pCR in all of the analyzed samples. Upregulation of miR-127 correlated with pCR in triple-negative breast cancer (TNBC). After the first NACT dose, pCR was predicted by exo-miR-141, while miR-34a, exo-miR182, and exo-miR-183 predicted non-pCR. A significant correlation between the candidate miRNAs and the overall survival, subtype, and metastasis in breast cancer, suggesting their potential role as predictive biomarkers of pCR. Conclusions: If the miRNAs identified in this study are validated in a large cohort of patients, they might serve as predictive non-invasive liquid biopsy biomarkers for monitoring pCR to NACT in breast cancer.
Collapse
Affiliation(s)
- Valentina K. Todorova
- Division of Medical Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
- Correspondence:
| | - Stephanie D. Byrum
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (A.J.G.)
| | - Allen J. Gies
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; (S.D.B.); (A.J.G.)
| | - Cade Haynie
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (C.H.); (H.S.); (N.S.R.)
| | - Hunter Smith
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (C.H.); (H.S.); (N.S.R.)
| | - Nathan S. Reyna
- Biology Department, Ouachita Baptist University, Arkadelphia, AR 71998, USA; (C.H.); (H.S.); (N.S.R.)
| | - Issam Makhoul
- Division of Medical Oncology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| |
Collapse
|
13
|
Fratello M, Cattelani L, Federico A, Pavel A, Scala G, Serra A, Greco D. Unsupervised Algorithms for Microarray Sample Stratification. Methods Mol Biol 2022; 2401:121-146. [PMID: 34902126 DOI: 10.1007/978-1-0716-1839-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.
Collapse
Affiliation(s)
- Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Giovanni Scala
- Department of Biology, University of Naples Federico II, Naples, Italy
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
14
|
He K. Filter Feature Selection for Unsupervised Clustering of Designer Drugs Using DFT Simulated IR Spectra Data. ACS OMEGA 2021; 6:32151-32165. [PMID: 34870036 PMCID: PMC8638022 DOI: 10.1021/acsomega.1c04945] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
The rapid emergence of novel psychoactive substances (NPS) poses new challenges and requirements for forensic testing/analysis techniques. This paper aims to explore the application of unsupervised clustering of NPS compounds' infrared spectra. Two statistical measures, Pearson and Spearman, were used to quantify the spectral similarity and to generate similarity matrices for hierarchical clustering. The correspondence of spectral similarity clustering trees to the commonly used structural/pharmacological categorization was evaluated and compared to the clustering generated using 2D/3D molecular fingerprints. Hybrid model feature selections were applied using different filter-based feature ranking algorithms developed for unsupervised clustering tasks. Since Spearman tends to overestimate the spectral similarity based on the overall pattern of the full spectrum, the clustering result shows the highest degree of improvement from having the nondiscriminative features removed. The loading plots of the first two principal components of the optimal feature subsets confirmed that the most important vibrational bands contributing to the clustering of NPS compounds were selected using non-negative discriminative feature selection (NDFS) algorithms.
Collapse
|
15
|
Gene Expression Analysis through Parallel Non-Negative Matrix Factorization. COMPUTATION 2021. [DOI: 10.3390/computation9100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.
Collapse
|
16
|
Identification of new BACE1 inhibitors for treating Alzheimer's disease. J Mol Model 2021; 27:58. [PMID: 33517514 DOI: 10.1007/s00894-021-04679-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 01/14/2021] [Indexed: 12/18/2022]
Abstract
Alzheimer's disease (AD) is a type of brain disorder, wherein a person experiences gradual memory loss, state of confusion, hallucination, agitation, and personality change. AD is marked by the presence of extracellular amyloid plaques and intracellular neurofibrillary tangles (NFTs) and synaptic losses. Increased cases of AD in recent times created a dire need to discover or identify chemical compounds that can cease the development of AD. This study focuses on finding potential drug molecule(s) active against β-secretase, also known as β-site amyloid precursor protein cleaving enzyme 1 (BACE1). Clustering analysis followed by phylogenetic studies on microarray datasets retrieved from GEO browser showed that BACE1 gene has genetic relatedness with the RCAN1 gene. A ligand library comprising 60 natural compounds retrieved from literature and 25 synthetic compounds collected from DrugBank were screened. Further, 350 analogues of potential parent compounds were added to the library for the docking purposes. Molecular docking studies identified 11-oxotigogenin as the best ligand molecule. The compound showed the binding affinity of - 11.1 Kcal/mole and forms three hydrogen bonds with Trp124, Ile174, and Arg176. The protein-ligand complex was subjected to 25 ns molecular dynamics simulation and the potential energy of the complex was found to be - 1.24579e+06 Kcal/mole. In this study, 11-oxotigogenin has shown promising results against BACE1, which is a leading cause of AD, hence warrants for in vitro and in vivo validation of the same. In addition, in silico identification of 11-oxotigogenin as a potential anti-AD compound paves the way for designing of chemical scaffolds to discover more potent BACE1 inhibitors.Graphical abstract.
Collapse
|
17
|
Sadeghi M, Barzegar A. Precision medicine insight into primary prostate tumor through transcriptomic data and an integrated systems biology approach. Meta Gene 2020. [DOI: 10.1016/j.mgene.2020.100787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
18
|
Javed A, Lee BS, Rizzo DM. A benchmark study on time series clustering. MACHINE LEARNING WITH APPLICATIONS 2020. [DOI: 10.1016/j.mlwa.2020.100001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
19
|
Grillone K, Riillo C, Scionti F, Rocca R, Tradigo G, Guzzi PH, Alcaro S, Di Martino MT, Tagliaferri P, Tassone P. Non-coding RNAs in cancer: platforms and strategies for investigating the genomic "dark matter". J Exp Clin Cancer Res 2020; 39:117. [PMID: 32563270 PMCID: PMC7305591 DOI: 10.1186/s13046-020-01622-x] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 06/11/2020] [Indexed: 12/18/2022] Open
Abstract
The discovery of the role of non-coding RNAs (ncRNAs) in the onset and progression of malignancies is a promising frontier of cancer genetics. It is clear that ncRNAs are candidates for therapeutic intervention, since they may act as biomarkers or key regulators of cancer gene network. Recently, profiling and sequencing of ncRNAs disclosed deep deregulation in human cancers mostly due to aberrant mechanisms of ncRNAs biogenesis, such as amplification, deletion, abnormal epigenetic or transcriptional regulation. Although dysregulated ncRNAs may promote hallmarks of cancer as oncogenes or antagonize them as tumor suppressors, the mechanisms behind these events remain to be clarified. The development of new bioinformatic tools as well as novel molecular technologies is a challenging opportunity to disclose the role of the "dark matter" of the genome. In this review, we focus on currently available platforms, computational analyses and experimental strategies to investigate ncRNAs in cancer. We highlight the differences among experimental approaches aimed to dissect miRNAs and lncRNAs, which are the most studied ncRNAs. These two classes indeed need different investigation taking into account their intrinsic characteristics, such as length, structures and also the interacting molecules. Finally, we discuss the relevance of ncRNAs in clinical practice by considering promises and challenges behind the bench to bedside translation.
Collapse
Affiliation(s)
- Katia Grillone
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Caterina Riillo
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Medical and Translational Oncology Units, AOU Mater Domini, 88100 Catanzaro, Italy
| | - Francesca Scionti
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Roberta Rocca
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Net4science srl, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Giuseppe Tradigo
- Laboratory of Bioinformatics, Department of Medical and Surgical Sciences, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Pietro Hiram Guzzi
- Laboratory of Bioinformatics, Department of Medical and Surgical Sciences, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Stefano Alcaro
- Net4science srl, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Department of Health Sciences, Magna Græcia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
| | - Maria Teresa Di Martino
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Medical and Translational Oncology Units, AOU Mater Domini, 88100 Catanzaro, Italy
| | - Pierosandro Tagliaferri
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Medical and Translational Oncology Units, AOU Mater Domini, 88100 Catanzaro, Italy
| | - Pierfrancesco Tassone
- Laboratory of Translational Medical Oncology, Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy
- Medical and Translational Oncology Units, AOU Mater Domini, 88100 Catanzaro, Italy
| |
Collapse
|
20
|
León-Cachón RBR, Bamford AD, Meester I, Barrera-Saldaña HA, Gómez-Silva M, Bustos MFG. The atorvastatin metabolic phenotype shift is influenced by interaction of drug-transporter polymorphisms in Mexican population: results of a randomized trial. Sci Rep 2020; 10:8900. [PMID: 32483134 PMCID: PMC7264171 DOI: 10.1038/s41598-020-65843-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/08/2020] [Indexed: 12/18/2022] Open
Abstract
Atorvastatin (ATV) is a blood cholesterol-lowering drug used to prevent cardiovascular events, the leading cause of death worldwide. As pharmacokinetics, metabolism and response vary among individuals, we wanted to determine the most reliable metabolic ATV phenotypes and identify novel and preponderant genetic markers that affect ATV plasma levels. A controlled, randomized, crossover, single-blind, three-treatment, three-period, and six-sequence clinical study of ATV (single 80-mg oral dose) was conducted among 60 healthy Mexican men. ATV plasma levels were measured using high-performance liquid chromatography mass spectrometry. Genotyping was performed by real-time PCR with TaqMan probes. Four ATV metabolizer phenotypes were found: slow, intermediate, normal and fast. Six gene polymorphisms, SLCO1B1-rs4149056, ABCB1-rs1045642, CYP2D6-rs1135840, CYP2B6-rs3745274, NAT2-rs1208, and COMT- rs4680, had a significant effect on ATV pharmacokinetics (P < 0.05). The polymorphisms in SLCO1B1 and ABCB1 seemed to have a greater effect and were especially important for the shift from an intermediate to a normal metabolizer. This is the first study that demonstrates how the interaction of genetic variants affect metabolic phenotyping and improves understanding of how SLCO1B1 and ABCB1 variants that affect statin metabolism may partially explain the variability in drug response. Notwithstanding, the influence of other genetic and non-genetic factors is not ruled out.
Collapse
Affiliation(s)
- Rafael B R León-Cachón
- Center of Molecular Diagnostics and Personalized Medicine, Department of Basic Sciences, Division of Health Sciences, University of Monterrey, San Pedro Garza Garcia, Nuevo Leon, Mexico.
| | - Aileen-Diane Bamford
- Center of Molecular Diagnostics and Personalized Medicine, Department of Basic Sciences, Division of Health Sciences, University of Monterrey, San Pedro Garza Garcia, Nuevo Leon, Mexico
| | - Irene Meester
- Center of Molecular Diagnostics and Personalized Medicine, Department of Basic Sciences, Division of Health Sciences, University of Monterrey, San Pedro Garza Garcia, Nuevo Leon, Mexico
| | | | - Magdalena Gómez-Silva
- Forensic Medicine Service, School of Medicine, Autonomous University of Nuevo Leon, Monterrey, Nuevo Leon, Mexico.,Analytical Department of the Research Institute for Clinical and Experimental Pharmacology, Ipharma S.A., Monterrey, Nuevo Leon, Mexico
| | - María F García Bustos
- Institute of Experimental Pathology (CONICET), Faculty of Health Sciences, National University of Salta, Salta, Argentina.,University School in Health Sciences, Catholic University of Salta, Salta, Argentina
| |
Collapse
|
21
|
Choi H, Kim Y, Kang D, Kwon A, Kim J, Min Kim J, Park SS, Kim YJ, Min CK, Kim M. Common and different alterations of bone marrow mesenchymal stromal cells in myelodysplastic syndrome and multiple myeloma. Cell Prolif 2020; 53:e12819. [PMID: 32372504 PMCID: PMC7260074 DOI: 10.1111/cpr.12819] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 03/13/2020] [Accepted: 04/11/2020] [Indexed: 12/12/2022] Open
Abstract
Objective The objective of this study was to explore characteristics of bone marrow mesenchymal stromal cells (BM‐MSCs) derived from patients with myelodysplastic syndrome (MDS) and multiple myeloma (MM). Methods BM‐MSCs were recovered from 17 of MDS patients, 23 of MM patients and 9 healthy donors and were passaged until proliferation stopped. General characteristics and gene expression profiles of MSCs were analysed. In vitro, ex vivo coculture, immunohistochemistry and knockdown experiments were performed to verify gene expression changes. Results BM‐MSCs failed to culture in 35.0% of patients and 50.0% of recovered BM‐MSCs stopped to proliferate before passage 6. MDS‐ and MM‐MSCs shared characteristics including decreased osteogenesis, increased angiogenesis and senescence‐associated molecular pathways. In vitro and ex vivo experiments showed disease‐specific changes such as neurogenic tendency in MDS‐MSCs and cardiomyogenic tendency in MM‐MSCs. Although the age of normal control was younger than patients and telomere length was shorter in patient's BM‐MSCs, they were not different according to disease category nor degree of proliferation. Specifically, poorly proliferation BM‐MSCs showed CDKN2A overexpression and CXCL12 downregulation. Immunohistochemistry of BM biopsy demonstrated that CDKN2A was intensely accumulation in perivascular BM‐MSCs failed to culture. Interestingly, patient's BM‐MSCs revealed improved proliferation activity after CDKN2A knockdown. Conclusion These results collectively indicate that MDS‐MSCs and MM‐MSCs have common and different alterations at various degrees. Hence, it is necessary to evaluate their alteration status using representative markers such as CDKN2A expression.
Collapse
Affiliation(s)
- Hayoung Choi
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.,Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Korea
| | - Yonggoo Kim
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.,Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Dain Kang
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Ahlm Kwon
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Jiyeon Kim
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | | | - Sung-Soo Park
- Department of Hematology, Leukemia Research Institute, Seoul St. Mary's Hematology Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Yoo-Jin Kim
- Department of Hematology, Leukemia Research Institute, Seoul St. Mary's Hematology Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Chang-Ki Min
- Department of Hematology, Leukemia Research Institute, Seoul St. Mary's Hematology Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Myungshin Kim
- Catholic Genetic Laboratory Center, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.,Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul, Korea.,Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul, Korea
| |
Collapse
|
22
|
Nwadiugwu MC. Gene-Based Clustering Algorithms: Comparison Between Denclue, Fuzzy-C, and BIRCH. Bioinform Biol Insights 2020; 14:1177932220909851. [PMID: 32284672 PMCID: PMC7133071 DOI: 10.1177/1177932220909851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 02/02/2020] [Indexed: 11/17/2022] Open
Abstract
The current study seeks to compare 3 clustering algorithms that can be used in gene-based bioinformatics research to understand disease networks, protein-protein interaction networks, and gene expression data. Denclue, Fuzzy-C, and Balanced Iterative and Clustering using Hierarchies (BIRCH) were the 3 gene-based clustering algorithms selected. These algorithms were explored in relation to the subfield of bioinformatics that analyzes omics data, which include but are not limited to genomics, proteomics, metagenomics, transcriptomics, and metabolomics data. The objective was to compare the efficacy of the 3 algorithms and determine their strength and drawbacks. Result of the review showed that unlike Denclue and Fuzzy-C which are more efficient in handling noisy data, BIRCH can handle data set with outliers and have a better time complexity.
Collapse
Affiliation(s)
- Martin C Nwadiugwu
- Department of Biomedical Informatics, University of Nebraska Omaha, Omaha, NE, USA
| |
Collapse
|
23
|
Verma Y, Yadav A, Katara P. Mining of cancer core-genes and their protein interactome using expression profiling based PPI network approach. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2019.100583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
24
|
|
25
|
Mabu AM, Prasad R, Yadav R. Gene Expression Dataset Classification Using Artificial Neural Network and Clustering-Based Feature Selection. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2020. [DOI: 10.4018/ijsir.2020010104] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the progression of bioinformatics, applications of GE profiles on cancer diagnosis along with classification have become an intriguing subject in the bioinformatics field. It holds numerous genes with few samples that make it arduous to examine and process. A novel strategy aimed at the classification of GE dataset as well as clustering-centered feature selection is proposed in the paper. The proposed technique first preprocesses the dataset using normalization, and later, feature selection was accomplished with the assistance of feature clustering support vector machine (FCSVM). It has two phases, gene clustering and gene representation. To make the chose top-positioned features worthy for classification, feature reduction is performed by utilizing SVM-recursive feature elimination (SVM-RFE) algorithm. Finally, the feature-reduced data set was classified using artificial neural network (ANN) classifier. When compared with some recent swarm intelligence feature reduction approach, FCSVM-ANN showed an elegant performance.
Collapse
Affiliation(s)
| | - Rajesh Prasad
- African University of Science and Technology, Abuja, Nigeria
| | | |
Collapse
|
26
|
Mabu AM, Prasad R, Yadav R. Mining gene expression data using data mining techniques: A critical review. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES 2019. [DOI: 10.1080/02522667.2018.1555311] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Audu Musa Mabu
- Department of Computer Science & Information Technology, Sam Higginbottom University of Agriculture, Technology and Sciences, Naini, Allahabad 211007, Uttar Pradesh, India,
| | - Rajesh Prasad
- School of Information Technology & Computing, American University of Nigeria, Yola 640101, Nigeria
| | - Raghav Yadav
- Department of Computer Science & Information Technology, Sam Higginbottom University of Agriculture, Technology and Sciences, Naini, Allahabad 211007, Uttar Pradesh, India,
| |
Collapse
|
27
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
28
|
Ye W, Long Y, Ji G, Su Y, Ye P, Fu H, Wu X. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis. BMC Genomics 2019; 20:75. [PMID: 30669970 PMCID: PMC6343338 DOI: 10.1186/s12864-019-5433-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
29
|
Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LDF, Rodrigues FA. Clustering algorithms: A comparative approach. PLoS One 2019; 14:e0210236. [PMID: 30645617 PMCID: PMC6333366 DOI: 10.1371/journal.pone.0210236] [Citation(s) in RCA: 121] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 12/19/2018] [Indexed: 12/04/2022] Open
Abstract
Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While many classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in many possible scenarios. In this context, we performed a systematic comparison of 9 well-known clustering methods available in the R language assuming normally distributed data. In order to account for the many possible variations of data, we considered artificial datasets with several tunable properties (number of classes, separation between classes, etc). In addition, we also evaluated the sensitivity of the clustering methods with regard to their parameters configuration. The results revealed that, when considering the default configurations of the adopted methods, the spectral approach tended to present particularly good performance. We also found that the default configuration of the adopted implementations was not always accurate. In these cases, a simple approach based on random selection of parameters values proved to be a good alternative to improve the performance. All in all, the reported approach provides subsidies guiding the choice of clustering algorithms.
Collapse
Affiliation(s)
- Mayra Z. Rodriguez
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Cesar H. Comin
- Department of Computer Science, Federal University of São Carlos, São Carlos, São Paulo, Brazil
- * E-mail:
| | | | - Odemir M. Bruno
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Diego R. Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Luciano da F. Costa
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil
| | - Francisco A. Rodrigues
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil
| |
Collapse
|
30
|
Rahman MA, Islam MZ. Application of a density based clustering technique on biomedical datasets. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
31
|
Abu-Jamous B, Kelly S. Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data. Genome Biol 2018; 19:172. [PMID: 30359297 PMCID: PMC6203272 DOI: 10.1186/s13059-018-1536-8] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/11/2018] [Indexed: 01/24/2023] Open
Abstract
Identifying co-expressed gene clusters can provide evidence for genetic or physical interactions. Thus, co-expression clustering is a routine step in large-scale analyses of gene expression data. We show that commonly used clustering methods produce results that substantially disagree and that do not match the biological expectations of co-expressed gene clusters. We present clust, a method that solves these problems by extracting clusters matching the biological expectations of co-expressed genes and outperforms widely used methods. Additionally, clust can simultaneously cluster multiple datasets, enabling users to leverage the large quantity of public expression data for novel comparative analysis. Clust is available at https://github.com/BaselAbujamous/clust.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
32
|
Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data. INFORMATION 2018. [DOI: 10.3390/info9040101] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
33
|
Mandal K, Sarmah R, Bhattacharyya DK. Biomarker Identification for Cancer Disease Using Biclustering Approach: An Empirical Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:490-509. [PMID: 29993834 DOI: 10.1109/tcbb.2018.2820695] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents an exhaustive empirical study to identify biomarkers using two approaches: frequency-based and network-based, over seventeen different biclustering algorithms and six different cancer expression datasets. To systematically analyze the biclustering algorithms, we perform enrichment analysis, subtype identification and biomarker identification. Biclustering algorithms such as C&C, SAMBA and Plaid are useful to detect biomarkers by both approaches for all datasets except prostate cancer. We detect a total of 102 gene biomarkers using frequency-based method out of which 19 are for blood cancer, 36 for lung cancer, 25 for colon cancer, 13 for multi-tissue cancer and 9 for prostate cancer. Using the network-based approach we detect a total of 41 gene biomarkers of which 15 are from blood cancer, 12 from lung cancer, 6 from colon cancer, 7 from multi-tissue cancer and 1 from prostate cancer dataset. We further extend our network analysis over some biclusters and detect some gene biomarkers not detected earlier by both frequency-based or network-based approach. We expand our work on breast cancer miRNA expression data to evaluate the performance of the biclustering algorithms. We detect 19 breast cancer biomarkers by frequency-based method and 5 by network-based method for the miRNA dataset.
Collapse
|
34
|
Integrity, standards, and QC-related issues with big data in pre-clinical drug discovery. Biochem Pharmacol 2018; 152:84-93. [PMID: 29551586 DOI: 10.1016/j.bcp.2018.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 03/13/2018] [Indexed: 11/21/2022]
Abstract
The tremendous expansion of data analytics and public and private big datasets presents an important opportunity for pre-clinical drug discovery and development. In the field of life sciences, the growth of genetic, genomic, transcriptomic and proteomic data is partly driven by a rapid decline in experimental costs as biotechnology improves throughput, scalability, and speed. Yet far too many researchers tend to underestimate the challenges and consequences involving data integrity and quality standards. Given the effect of data integrity on scientific interpretation, these issues have significant implications during preclinical drug development. We describe standardized approaches for maximizing the utility of publicly available or privately generated biological data and address some of the common pitfalls. We also discuss the increasing interest to integrate and interpret cross-platform data. Principles outlined here should serve as a useful broad guide for existing analytical practices and pipelines and as a tool for developing additional insights into therapeutics using big data.
Collapse
|
35
|
Ji G, Lin Q, Long Y, Ye C, Ye W, Wu X. PAcluster: Clustering polyadenylation site data using canonical correlation analysis. J Bioinform Comput Biol 2017; 15:1750018. [PMID: 28874086 DOI: 10.1142/s0219720017500184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
Collapse
Affiliation(s)
- Guoli Ji
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Qianmin Lin
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Yuqi Long
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Congting Ye
- † College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, P. R. China
| | - Wenbin Ye
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Xiaohui Wu
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| |
Collapse
|
36
|
Lötsch J, Lippmann C, Kringel D, Ultsch A. Integrated Computational Analysis of Genes Associated with Human Hereditary Insensitivity to Pain. A Drug Repurposing Perspective. Front Mol Neurosci 2017; 10:252. [PMID: 28848388 PMCID: PMC5550731 DOI: 10.3389/fnmol.2017.00252] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 07/26/2017] [Indexed: 12/31/2022] Open
Abstract
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence.
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany.,Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Catharina Lippmann
- Fraunhofer Institute of Molecular Biology and Applied Ecology-Project Group, Translational Medicine and Pharmacology (IME-TMP)Frankfurt am Main, Germany
| | - Dario Kringel
- Institute of Clinical Pharmacology, Goethe-UniversityFrankfurt am Main, Germany
| | - Alfred Ultsch
- DataBionics Research Group, University of MarburgMarburg, Germany
| |
Collapse
|
37
|
A data analysis framework for biomedical big data: Application on mesoderm differentiation of human pluripotent stem cells. PLoS One 2017; 12:e0179613. [PMID: 28654683 PMCID: PMC5487013 DOI: 10.1371/journal.pone.0179613] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Accepted: 05/31/2017] [Indexed: 12/16/2022] Open
Abstract
The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure data analysis in future research, both in stem cell differentiation, and more generally, in biomedical big data analytics.
Collapse
|
38
|
Aschenbrenner AC, Bassler K, Brondolin M, Bonaguro L, Carrera P, Klee K, Ulas T, Schultze JL, Hoch M. A cross-species approach to identify transcriptional regulators exemplified for Dnajc22 and Hnf4a. Sci Rep 2017; 7:4056. [PMID: 28642491 PMCID: PMC5481429 DOI: 10.1038/s41598-017-04370-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 05/05/2017] [Indexed: 12/03/2022] Open
Abstract
There is an enormous need to make better use of the ever increasing wealth of publicly available genomic information and to utilize the tremendous progress in computational approaches in the life sciences. Transcriptional regulation of protein-coding genes is a major mechanism of controlling cellular functions. However, the myriad of transcription factors potentially controlling transcription of any given gene makes it often difficult to quickly identify the biological relevant transcription factors. Here, we report on the identification of Hnf4a as a major transcription factor of the so far unstudied DnaJ heat shock protein family (Hsp40) member C22 (Dnajc22). We propose an approach utilizing recent advances in computational biology and the wealth of publicly available genomic information guiding the identification of potential transcription factor candidates together with wet-lab experiments validating computational models. More specifically, the combined use of co-expression analyses based on self-organizing maps with sequence-based transcription factor binding prediction led to the identification of Hnf4a as the potential transcriptional regulator for Dnajc22 which was further corroborated using publicly available datasets on Hnf4a. Following this procedure, we determined its functional binding site in the murine Dnajc22 locus using ChIP-qPCR and luciferase assays and verified this regulatory loop in fruitfly, zebrafish, and humans.
Collapse
Affiliation(s)
- A C Aschenbrenner
- Developmental Genetics & Molecular Physiology, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany.
| | - K Bassler
- Genomics and Immunoregulation, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| | - M Brondolin
- Developmental Genetics & Molecular Physiology, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
- Department of Craniofacial Development and Stem Cell Biology, Dental Institute, King's College London, SE1 9RT, London, United Kingdom
| | - L Bonaguro
- Developmental Genetics & Molecular Physiology, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| | - P Carrera
- Developmental Genetics & Molecular Physiology, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| | - K Klee
- Genomics and Immunoregulation, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| | - T Ulas
- Genomics and Immunoregulation, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| | - J L Schultze
- Genomics and Immunoregulation, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
- Single Cell Genomics and Epigenomics Unit at the German Center for Neurodegenerative Diseases and the University of Bonn, 53175, Bonn, Germany
| | - M Hoch
- Developmental Genetics & Molecular Physiology, Life & Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany
| |
Collapse
|
39
|
Fu W, Zhu P, Wei S, Zhixin D, Wang C, Wu X, Li F, Zhu S. Multiplex enrichment quantitative PCR (ME-qPCR): a high-throughput, highly sensitive detection method for GMO identification. Anal Bioanal Chem 2017; 409:2655-2664. [PMID: 28154881 DOI: 10.1007/s00216-017-0209-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 01/06/2017] [Accepted: 01/13/2017] [Indexed: 11/29/2022]
Abstract
Among all of the high-throughput detection methods, PCR-based methodologies are regarded as the most cost-efficient and feasible methodologies compared with the next-generation sequencing or ChIP-based methods. However, the PCR-based methods can only achieve multiplex detection up to 15-plex due to limitations imposed by the multiplex primer interactions. The detection throughput cannot meet the demands of high-throughput detection, such as SNP or gene expression analysis. Therefore, in our study, we have developed a new high-throughput PCR-based detection method, multiplex enrichment quantitative PCR (ME-qPCR), which is a combination of qPCR and nested PCR. The GMO content detection results in our study showed that ME-qPCR could achieve high-throughput detection up to 26-plex. Compared to the original qPCR, the Ct values of ME-qPCR were lower for the same group, which showed that ME-qPCR sensitivity is higher than the original qPCR. The absolute limit of detection for ME-qPCR could achieve levels as low as a single copy of the plant genome. Moreover, the specificity results showed that no cross-amplification occurred for irrelevant GMO events. After evaluation of all of the parameters, a practical evaluation was performed with different foods. The more stable amplification results, compared to qPCR, showed that ME-qPCR was suitable for GMO detection in foods. In conclusion, ME-qPCR achieved sensitive, high-throughput GMO detection in complex substrates, such as crops or food samples. In the future, ME-qPCR-based GMO content identification may positively impact SNP analysis or multiplex gene expression of food or agricultural samples. Graphical abstract For the first-step amplification, four primers (A, B, C, and D) have been added into the reaction volume. In this manner, four kinds of amplicons have been generated. All of these four amplicons could be regarded as the target of second-step PCR. For the second-step amplification, three parallels have been taken for the final evaluation. After the second evaluation, the final amplification curves and melting curves have been achieved.
Collapse
Affiliation(s)
- Wei Fu
- The Institute of Plant Quarantine, Chinese Academy of Inspection and Quarantine, Ronghuananlu No.11, Beijing Economic-Technological Developmental Area, Beijing, 100176, China
| | - Pengyu Zhu
- The Institute of Plant Quarantine, Chinese Academy of Inspection and Quarantine, Ronghuananlu No.11, Beijing Economic-Technological Developmental Area, Beijing, 100176, China
| | - Shuang Wei
- Shantou Entry-Exit Inspection and Quarantine Bureau, Building, No.126, Jinsha Road, Shantou, Guangdong, 515041, China
| | - Du Zhixin
- Guangxi Entry-Exit Inspection and Quarantine Bureau, No.38, Binhu Road, Qingxiu District, Nanning, Guangxi, 530028, China
| | - Chenguang Wang
- The Institute of Plant Quarantine, Chinese Academy of Inspection and Quarantine, Ronghuananlu No.11, Beijing Economic-Technological Developmental Area, Beijing, 100176, China
| | - Xiyang Wu
- Department of Food Science and Engineering, Jinan University, Guangzhou, Guangdong, 510632, China
| | - Feiwu Li
- Institute of Agricultural Standard and Testing Technology, Jilin Academy of Agricultural Sciences, No. 1363 Shengtai St., Changchun, Jilin, 130033, China.
| | - Shuifang Zhu
- The Institute of Plant Quarantine, Chinese Academy of Inspection and Quarantine, Ronghuananlu No.11, Beijing Economic-Technological Developmental Area, Beijing, 100176, China.
| |
Collapse
|
40
|
Oyelade J, Isewon I, Oladipupo F, Aromolaran O, Uwoghiren E, Ameh F, Achas M, Adebiyi E. Clustering Algorithms: Their Application to Gene Expression Data. Bioinform Biol Insights 2016; 10:237-253. [PMID: 27932867 PMCID: PMC5135122 DOI: 10.4137/bbi.s38316] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 09/05/2016] [Accepted: 09/09/2016] [Indexed: 12/17/2022] Open
Abstract
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.
Collapse
Affiliation(s)
- Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Funke Oladipupo
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
| | - Olufemi Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
| | - Efosa Uwoghiren
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
| | - Faridah Ameh
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
| | - Moses Achas
- Department of Computer Science and Information Technology, Bells University of Technology, Ota, Ogun State, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
41
|
Hou J, Acharya L, Zhu D, Cheng J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 2016; 15:95-108. [PMID: 26476430 PMCID: PMC5065356 DOI: 10.1093/bfgp/elv040] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed.
Collapse
|
42
|
Jothi R, Mohanty SK, Ojha A. Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput Biol Med 2016; 71:135-48. [PMID: 26945461 DOI: 10.1016/j.compbiomed.2016.02.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 01/16/2016] [Accepted: 02/12/2016] [Indexed: 10/22/2022]
Abstract
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms.
Collapse
Affiliation(s)
- R Jothi
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| | - Sraban Kumar Mohanty
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| | - Aparajita Ojha
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| |
Collapse
|
43
|
Graph-based unsupervised feature selection and multiview clustering for microarray data. J Biosci 2015; 40:755-67. [DOI: 10.1007/s12038-015-9559-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Hu CW, Kornblau SM, Slater JH, Qutub AA. Progeny Clustering: A Method to Identify Biological Phenotypes. Sci Rep 2015; 5:12894. [PMID: 26267476 PMCID: PMC4533525 DOI: 10.1038/srep12894] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/15/2015] [Indexed: 01/24/2023] Open
Abstract
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset.
Collapse
Affiliation(s)
| | - Steven M Kornblau
- Departments of Leukemia and Stem Cell Transplant, University of Texas MD Anderson Cancer Center
| | - John H Slater
- Department of Biomedical Engineering, University of Delaware
| | | |
Collapse
|
45
|
Lee WP, Lin CH. Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction. Cognit Comput 2015. [DOI: 10.1007/s12559-015-9349-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
46
|
Pirim H, Ekşioğlu B, Perkins AD. Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic. Comput Biol Med 2015; 62:94-102. [DOI: 10.1016/j.compbiomed.2015.03.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 03/04/2015] [Accepted: 03/31/2015] [Indexed: 10/23/2022]
|
47
|
|
48
|
Sturrock M, Murray PJ, Matzavinos A, Chaplain MAJ. Mean field analysis of a spatial stochastic model of a gene regulatory network. J Math Biol 2014; 71:921-59. [PMID: 25323318 DOI: 10.1007/s00285-014-0837-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 09/05/2014] [Indexed: 01/21/2023]
Abstract
A gene regulatory network may be defined as a collection of DNA segments which interact with each other indirectly through their RNA and protein products. Such a network is said to contain a negative feedback loop if its products inhibit gene transcription, and a positive feedback loop if a gene product promotes its own production. Negative feedback loops can create oscillations in mRNA and protein levels while positive feedback loops are primarily responsible for signal amplification. It is often the case in real biological systems that both negative and positive feedback loops operate in parameter regimes that result in low copy numbers of gene products. In this paper we investigate the spatio-temporal dynamics of a single feedback loop in a eukaryotic cell. We first develop a simplified spatial stochastic model of a canonical feedback system (either positive or negative). Using a Gillespie's algorithm, we compute sample trajectories and analyse their corresponding statistics. We then derive a system of equations that describe the spatio-temporal evolution of the stochastic means. Subsequently, we examine the spatially homogeneous case and compare the results of numerical simulations with the spatially explicit case. Finally, using a combination of steady-state analysis and data clustering techniques, we explore model behaviour across a subregion of the parameter space that is difficult to access experimentally and compare the parameter landscape of our spatio-temporal and spatially-homogeneous models.
Collapse
Affiliation(s)
- M Sturrock
- Mathematical Biosciences Institute, The Ohio State University, Columbus, OH, 43210, USA,
| | | | | | | |
Collapse
|
49
|
Wang M, Zhang W, Ding W, Dai D, Zhang H, Xie H, Chen L, Guo Y, Xie J. Parallel clustering algorithm for large-scale biological data sets. PLoS One 2014; 9:e91315. [PMID: 24705246 PMCID: PMC3976248 DOI: 10.1371/journal.pone.0091315] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 02/10/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUNDS Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. METHODS Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. RESULT A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
Collapse
Affiliation(s)
- Minchao Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- High Performance Computing Center, Shanghai University, Shanghai, P.R.China
| | - Wang Ding
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Dongbo Dai
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Huiran Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Hao Xie
- College of Stomatology, Wuhan University, Wuhan, P.R.China
| | - Luonan Chen
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R.China
| | - Yike Guo
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Department of Computing, Imperial College London, London, United Kingdom
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| |
Collapse
|