1
|
REW-ISA: unveiling local functional blocks in epi-transcriptome profiling data via an RNA expression-weighted iterative signature algorithm. BMC Bioinformatics 2020; 21:447. [PMID: 33036550 PMCID: PMC7547494 DOI: 10.1186/s12859-020-03787-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 09/29/2020] [Indexed: 02/07/2023] Open
Abstract
Background Recent studies have shown that N6-methyladenosine (m6A) plays a critical role in numbers of biological processes and complex human diseases. However, the regulatory mechanisms of most methylation sites remain uncharted. Thus, in-depth study of the epi-transcriptomic patterns of m6A may provide insights into its complex functional and regulatory mechanisms. Results Due to the high economic and time cost of wet experimental methods, revealing methylation patterns through computational models has become a more preferable way, and drawn more and more attention. Considering the theoretical basics and applications of conventional clustering methods, an RNA Expression Weighted Iterative Signature Algorithm (REW-ISA) is proposed to find potential local functional blocks (LFBs) based on MeRIP-Seq data, where sites are hyper-methylated or hypo-methylated simultaneously across the specific conditions. REW-ISA adopts RNA expression levels of each site as weights to make sites of lower expression level less significant. It starts from random sets of sites, then follows iterative search strategies by thresholds of rows and columns to find the LFBs in m6A methylation profile. Its application on MeRIP-Seq data of 69,446 methylation sites under 32 experimental conditions unveiled 6 LFBs, which achieve higher enrichment scores than ISA. Pathway analysis and enzyme specificity test showed that sites remained in LFBs are highly relevant to the m6A methyltransferase, such as METTL3, METTL14, WTAP and KIAA1429. Further detailed analyses for each LFB even showed that some LFBs are condition-specific, indicating that methylation profiles of some specific sites may be condition relevant. Conclusions REW-ISA finds potential local functional patterns presented in m6A profiles, where sites are co-methylated under specific conditions.
Collapse
|
2
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
3
|
Feature clustering based support vector machine recursive feature elimination for gene selection. APPL INTELL 2017. [DOI: 10.1007/s10489-017-0992-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
Chen X, Jian C. Gene expression data clustering based on graph regularized subspace segmentation. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.06.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Zhou Y, Mihindukulasuriya KA, Gao H, La Rosa PS, Wylie KM, Martin JC, Kota K, Shannon WD, Mitreva M, Sodergren E, Weinstock GM. Exploration of bacterial community classes in major human habitats. Genome Biol 2014; 15:R66. [PMID: 24887286 PMCID: PMC4073010 DOI: 10.1186/gb-2014-15-5-r66] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 05/07/2014] [Indexed: 01/28/2023] Open
Abstract
Background Determining bacterial abundance variation is the first step in understanding bacterial similarity between individuals. Categorization of bacterial communities into groups or community classes is the subsequent step in describing microbial distribution based on abundance patterns. Here, we present an analysis of the groupings of bacterial communities in stool, nasal, skin, vaginal and oral habitats in a healthy cohort of 236 subjects from the Human Microbiome Project. Results We identify distinct community group patterns in the anterior nares, four skin sites, and vagina at the genus level. We also confirm three enterotypes previously identified in stools. We identify two clusters with low silhouette values in most oral sites, in which bacterial communities are more homogeneous. Subjects sharing a community class in one habitat do not necessarily share a community class in another, except in the three vaginal sites and the symmetric habitats of the left and right retroauricular creases. Demographic factors, including gender, age, and ethnicity, significantly influence community composition in several habitats. Community classes in the vagina, retroauricular crease and stool are stable over approximately 200 days. Conclusion The community composition, association of demographic factors with community classes, and demonstration of community stability deepen our understanding of the variability and dynamics of human microbiomes. This also has significant implications for experimental designs that seek microbial correlations with clinical phenotypes.
Collapse
|
6
|
Ochoa S, Huerta-Ramos E, Barajas A, Iniesta R, Dolz M, Baños I, Sánchez B, Carlson J, Foix A, Pelaez T, Coromina M, Pardo M, Usall J. Cognitive profiles of three clusters of patients with a first-episode psychosis. Schizophr Res 2013; 150:151-6. [PMID: 23958487 DOI: 10.1016/j.schres.2013.07.054] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 07/05/2013] [Accepted: 07/29/2013] [Indexed: 11/26/2022]
Abstract
OBJECTIVE The primary objective was to identify specific groups of patients with a first-episode psychosis based on family history, obstetric complications, neurological soft signs, and premorbid functioning. The secondary objective was to relate these groups with cognitive variables. METHOD A total of 62 first-episode psychoses were recruited from adult and child and adolescent mental health services. The inclusion criteria were patients between 7 and 65 years old (real range of the samples was 13-35 years old), two or more psychotic symptoms and less than one year from the onset of the symptoms. Premorbid functioning (PAS), soft signs (NES), obstetric complications and a neuropsychological battery (CPT, TMTA/TMTB, TAVEC/TAVECI, Stroop, specific subtest of WAIS-III/WISC-IV) were administered. RESULTS We found three clusters: 1) higher neurodevelopment contribution (N=14), 2) higher genetic contribution (N=30), and 3) lower neurodevelopment contribution (N=18). Statistical differences were found between groups in TMTB, learning curve of the TAVEC, digits of the WAIS and premorbid estimated IQ, the cluster 1 being the most impaired. CONCLUSIONS A cluster approach could differentiate several groups of patients with different cognitive performance. Neuropsychological interventions, as cognitive remediation, should be addressed specifically to patients with more impaired results.
Collapse
Affiliation(s)
- Susana Ochoa
- Parc Sanitari Sant Joan de Déu. Sant Boi de Llobregat (Barcelona), CIBERSAM, Spain.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Hoogendoorn B, Berube K, Gregory C, Jones T, Sexton K, Brennan P, Brewis IA, Murison A, Arthur R, Price H, Morgan H, Matthews IP. Gene and protein responses of human lung tissue explants exposed to ambient particulate matter of different sizes. Inhal Toxicol 2013; 24:966-75. [PMID: 23216157 DOI: 10.3109/08958378.2012.742600] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
CONTEXT Exposure to ambient particulate air pollution is associated with increased cardiovascular and respiratory morbidity and mortality. It is necessary to understand causal pathways driving the observed health effects, particularly if they are differentially associated with particle size. OBJECTIVES To investigate the effect of different size ranges of ambient particulate matter (PM) on gene and protein expression in an in vitro model. MATERIALS AND METHODS Normal human tracheobronchial epithelium (NHTBE) three-dimensional cell constructs were exposed for 24 h to washed ambient PM of different sizes (size 1: 7-615 nm; size 2: 616 nm-2.39 µm; size 3: 2.4-10 µm) collected from a residential street. A human stress and toxicity PCR array was used to investigate gene expression and iTRAQ was used to perform quantitative proteomics. RESULTS Eighteen different genes of the 84 on the PCR array were significantly dysregulated. Treatment with size 2 PM resulted in the greatest number of genes with altered expression, followed by size 1 and lastly size 3. ITRAQ identified 317 proteins, revealing 20 that were differentially expressed. Enrichment for gene ontology classification revealed potential changes to various pathways. DISCUSSION AND CONCLUSIONS Different size fractions of ambient PM are associated with dysregulatory effects on the cellular proteome and on stress and toxicity genes of NHTBE cells. This approach not only provides an investigative tool to identify possible causal pathways but also permits the relationship between particle size and responses to be explored.
Collapse
Affiliation(s)
- Bastiaan Hoogendoorn
- Department of Primary Care and Public Health, Neuadd Meirionnydd, School of Medicine, Heath Park, Cardiff, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Pirim H, Ekşioğlu B, Perkins A, Yüceer Ç. Clustering of High Throughput Gene Expression Data. COMPUTERS & OPERATIONS RESEARCH 2012; 39:3046-3061. [PMID: 23144527 PMCID: PMC3491664 DOI: 10.1016/j.cor.2012.03.008] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.
Collapse
Affiliation(s)
- Harun Pirim
- Department of Industrial and Systems Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS 39762
- Corresponding author. Tel.:+1-662-325-4226;
| | - Burak Ekşioğlu
- Department of Industrial and Systems Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS 39762
| | - Andy Perkins
- Department of Computer Science and Engineering, Mississippi State University
| | - Çetin Yüceer
- Department of Forestry, Mississippi State University
| |
Collapse
|
9
|
Li L, Guo Y, Wu W, Shi Y, Cheng J, Tao S. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Min 2012; 5:8. [PMID: 22824157 PMCID: PMC3447720 DOI: 10.1186/1756-0381-5-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2012] [Accepted: 06/19/2012] [Indexed: 11/10/2022] Open
Abstract
Background Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms’ performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
Collapse
Affiliation(s)
- Li Li
- State Key Laboratory of Crop Stress Biology in Arid Areas and College of Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | | | | | | | | | | |
Collapse
|
10
|
Prom-On S, Chanthaphan A, Chan JH, Meechai A. Enhancing biological relevance of a weighted gene co-expression network for functional module identification. J Bioinform Comput Biol 2011; 9:111-29. [PMID: 21328709 DOI: 10.1142/s0219720011005252] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 09/12/2010] [Accepted: 09/20/2010] [Indexed: 11/18/2022]
Abstract
Relationships among gene expression levels may be associated with the mechanisms of the disease. While identifying a direct association such as a difference in expression levels between case and control groups links genes to disease mechanisms, uncovering an indirect association in the form of a network structure may help reveal the underlying functional module associated with the disease under scrutiny. This paper presents a method to improve the biological relevance in functional module identification from the gene expression microarray data by enhancing the structure of a weighted gene co-expression network using minimum spanning tree. The enhanced network, which is called a backbone network, contains only the essential structural information to represent the gene co-expression network. The entire backbone network is decoupled into a number of coherent sub-networks, and then the functional modules are reconstructed from these sub-networks to ensure minimum redundancy. The method was tested with a simulated gene expression dataset and case-control expression datasets of autism spectrum disorder and colorectal cancer studies. The results indicate that the proposed method can accurately identify clusters in the simulated dataset, and the functional modules of the backbone network are more biologically relevant than those obtained from the original approach.
Collapse
Affiliation(s)
- Santitham Prom-On
- Computer Engineering Department, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Prachauthit Road, Bangmod, Thungkhru, Bangkok 10140, Thailand.
| | | | | | | |
Collapse
|
11
|
Lee H, Malaspina D, Ahn H, Perrin M, Opler MG, Kleinhaus K, Harlap S, Goetz R, Antonius D. Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis. Schizophr Res 2011; 128:143-9. [PMID: 21353765 PMCID: PMC3085629 DOI: 10.1016/j.schres.2011.02.006] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Revised: 02/04/2011] [Accepted: 02/07/2011] [Indexed: 11/21/2022]
Abstract
BACKGROUND Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. METHODS We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. RESULTS Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. CONCLUSIONS These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment.
Collapse
Affiliation(s)
- Hyejoo Lee
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | - Dolores Malaspina
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| | - Hongshik Ahn
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | - Mary Perrin
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| | - Mark G. Opler
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| | - Karine Kleinhaus
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| | - Susan Harlap
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| | - Raymond Goetz
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
- Department of Psychiatry, Columbia University, New York State Psychiatric Institute, New York, NY, USA
| | - Daniel Antonius
- Institute for Social and Psychiatric Initiatives (InSPIRES), Department of Psychiatry, New York University School of Medicine, New York, NY, USA
| |
Collapse
|
12
|
Bayá AE, Granitto PM. Clustering gene expression data with a penalized graph-based metric. BMC Bioinformatics 2011; 12:2. [PMID: 21205299 PMCID: PMC3023695 DOI: 10.1186/1471-2105-12-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 01/04/2011] [Indexed: 12/05/2022] Open
Abstract
Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.
Collapse
Affiliation(s)
- Ariel E Bayá
- CIFASIS French Argentine International Center for Information and Systems Sciences, UPCAM (France)/UNR-CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, República Argentina.
| | | |
Collapse
|
13
|
Holmans P. Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. ADVANCES IN GENETICS 2010; 72:141-79. [PMID: 21029852 DOI: 10.1016/b978-0-12-380862-2.00007-2] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A number of statistical methods have been developed to test for associations between pathways (collections of genes related biologically) and complex genetic traits. Pathway analysis methods were originally developed for analyzing gene expression data, but recently methods have been developed to perform pathway analysis on genome-wide association study (GWAS) data. The purpose of this review is to give an overview of these methods, enabling the reader to gain an understanding of what pathway analysis involves, and to select the method most suited to their purposes. This review describes the various types of statistical methods for pathway analysis, detailing the strengths and weaknesses of each. Factors influencing the power of pathway analyses, such as gene coverage and choice of pathways to analyze, are discussed, as well as various unresolved statistical issues. Finally, a list of computer programs for performing pathway analysis on genome-wide association data is provided.
Collapse
Affiliation(s)
- Peter Holmans
- Biostatistics and Bioinformatics Unit, MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, Cardiff University School of Medicine, Heath Park, Cardiff, United Kingdom
| |
Collapse
|