1
|
Sales de Queiroz A, Sales Santa Cruz G, Jean-Marie A, Mazauric D, Roux J, Cazals F. Gene prioritization based on random walks with restarts and absorbing states, to define gene sets regulating drug pharmacodynamics from single-cell analyses. PLoS One 2022; 17:e0268956. [PMID: 36342924 PMCID: PMC9639845 DOI: 10.1371/journal.pone.0268956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 05/12/2022] [Indexed: 11/09/2022] Open
Abstract
Prioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms of action and discovering new molecular targets for co-treatment. To formalize this problem, we consider two sets of genes X and P respectively composing the gene signature of cell sensitivity at the drug IC50 and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) containing the products of X and P as nodes. We introduce Genetrank, a method to prioritize the genes in X for their likelihood to regulate the genes in P. Genetrank uses asymmetric random walks with restarts, absorbing states, and a suitable renormalization scheme. Using novel so-called saturation indices, we show that the conjunction of absorbing states and renormalization yields an exploration of the PPIN which is much more progressive than that afforded by random walks with restarts only. Using MINT as underlying network, we apply Genetrank to a predictive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most significant differentially expressed genes obtained from a statistical analysis framework alone. We also introduce gene expression radars, a visualization tool embedded in MA plots to assess all pairwise interactions at a glance on graphical representations of transcriptomics data. Genetrank is made available in the Structural Bioinformatics Library (https://sbl.inria.fr/doc/Genetrank-user-manual.html). It should prove useful for mining gene sets in conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.
Collapse
Affiliation(s)
| | | | | | | | - Jérémie Roux
- CNRS UMR 7284, Inserm U 1081, Institut de Recherche sur le Cancer et le Vieillissement de Nice, Centre Antoine Lacassagne, Universite Côte d’Azur, Nice, France
- * E-mail: (FC); (JR)
| | - Frédéric Cazals
- Inria, Université Côte d’Azur, Nice, France
- * E-mail: (FC); (JR)
| |
Collapse
|
2
|
Osabe T, Shimizu K, Kadota K. Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data. BMC Bioinformatics 2021; 22:511. [PMID: 34670485 PMCID: PMC8527798 DOI: 10.1186/s12859-021-04438-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 10/11/2021] [Indexed: 11/10/2022] Open
Abstract
Background RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report that a model-based clustering algorithm implemented in an R package, MBCluster.Seq, can also be used for DE analysis. Results The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG (PDEG) was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm. Conclusions MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04438-4.
Collapse
Affiliation(s)
- Takayuki Osabe
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan.,Interfaculty Initiative in Information Studies, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan. .,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan. .,Interfaculty Initiative in Information Studies, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan.
| |
Collapse
|
3
|
Chen B, Gao L, Shang X. A two-way rectification method for identifying differentially expressed genes by maximizing the co-function relationship. BMC Genomics 2021; 22:471. [PMID: 34171992 PMCID: PMC8229713 DOI: 10.1186/s12864-021-07772-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 06/04/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The identification of differentially expressed genes (DEGs) is an important task in many biological studies. The currently widely used methods often calculate a score for each gene by estimating the significance level in terms of the differential expression. However, biological experiments often have only three duplications, plus plenty of noises contain in gene expression datasets, which brings a great challenge to statistical analysis methods. Moreover, the abundance of gene expression levels are not evenly distributed. Thus, those low expressed genes are more easily to be detected by fold-change based methods, which may results in high false positives among the DEG list. Since phenotypical changes result from DEGs should be strongly related to several distinct cellular functions, a more robust method should be designed to increase the true positive rate of the functional related DEGs. RESULTS In this study, we propose a two-way rectification method for identifying DEGs by maximizing the co-function relationships between genes and their enriched cellular pathways. An iteration strategy is employed to sequentially narrow down the group of identified DEGs and their associated biological functions. Functional analyses reveal that the identified DEGs are well organized in the form of functional modules, and the enriched pathways are very significant with lower p-value and larger gene count. CONCLUSIONS An integrative rectification method was proposed to identify key DEGs and their related functions simultaneously. The experimental validations demonstrate that the method has high interpretability and feasibility. It performs very well in terms of the identification of remarkable functional related genes.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi west road, Xi’an, 710072 China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, 127 Youyi west road, Xi’an, 710072 China
- Centre for Multidisciplinary Convergence Computing (CMCC), 127 Youyi west road, Xi’an, 710072 China
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, 127 Youyi west road, Xi’an, 710072 China
| | - Li Gao
- School of Software, Northwestern Polytechnical University, 127 Youyi west road, Xi’an, 710072 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi west road, Xi’an, 710072 China
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, 127 Youyi west road, Xi’an, 710072 China
| |
Collapse
|
4
|
Dynamic Expression of Genes Involved in Proteoglycan/Glycosaminoglycan Metabolism during Skin Development. BIOMED RESEARCH INTERNATIONAL 2018; 2018:9873471. [PMID: 30228991 PMCID: PMC6136507 DOI: 10.1155/2018/9873471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 07/04/2018] [Indexed: 11/30/2022]
Abstract
Glycosaminoglycans are important for cell signaling and therefore for proper embryonic development and adult homeostasis. Expressions of genes involved in proteoglycan/glycosaminoglycan (GAG) metabolism and of genes coding for growth factors known to bind GAGs were analyzed during skin development by microarray analysis and real time quantitative PCR. GAG related genes were organized in six categories based on their role in GAG homeostasis, viz. (1) production of precursor molecules, (2) production of core proteins, (3) synthesis of the linkage region, (4) polymerization, (5) modification, and (6) degradation of the GAG chain. In all categories highly dynamic up- and downregulations were observed during skin development, including differential expression of GAG modifying isoenzymes, core proteins, and growth factors. In two mice models, one overexpressing heparanase and one lacking C5 epimerase, differential expression of only few genes was observed. Data show that during skin development a highly dynamic and complex expression of GAG-associated genes occurs. This likely reflects quantitative and qualitative changes in GAGs/proteoglycans, including structural fine tuning, which may be correlated with growth factor handling.
Collapse
|
5
|
Abstract
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Collapse
Affiliation(s)
- Doulaye Dembélé
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), CNRS UMR 7104, INSERM U 1258, Université de Strasbourg, Illkirch-Graffenstaden, France
| |
Collapse
|
6
|
Zhao S, Sun J, Shimizu K, Kadota K. Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results. Biol Proced Online 2018; 20:5. [PMID: 29507534 PMCID: PMC5831220 DOI: 10.1186/s12575-018-0067-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/12/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hierarchical Sample clustering (HSC) is widely performed to examine associations within expression data obtained from microarrays and RNA sequencing (RNA-seq). Researchers have investigated the HSC results with several possible criteria for grouping (e.g., sex, age, and disease types). However, the evaluation of arbitrary defined groups still counts in subjective visual inspection. RESULTS To objectively evaluate the degree of separation between groups of interest in the HSC dendrogram, we propose to use Silhouette scores. Silhouettes was originally developed as a graphical aid for the validation of data clusters. It provides a measure of how well a sample is classified when it was assigned to a cluster by according to both the tightness of the clusters and the separation between them. It ranges from 1.0 to - 1.0, and a larger value for the average silhouette (AS) over all samples to be analyzed indicates a higher degree of cluster separation. The basic idea to use an AS is to replace the term cluster by group when calculating the scores. We investigated the validity of this score using simulated and real data designed for differential expression (DE) analysis. We found that larger (or smaller) AS values agreed well with both higher (or lower) degrees of separation between different groups and higher percentages of differentially expressed genes (PDEG). We also found that the AS values were generally independent on the number of replicates (Nrep). Although the PDEG values depended on Nrep, we confirmed that both AS and PDEG values were close to zero when samples in the data showed an intermingled nature between the groups in the HSC dendrogram. CONCLUSION Silhouettes is useful for exploring data with predefined group labels. It would help provide both an objective evaluation of HSC dendrograms and insights into the DE results with regard to the compared groups.
Collapse
Affiliation(s)
- Shitao Zhao
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Jianqiang Sun
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| |
Collapse
|
7
|
Saavedra C, Milan M, Leite RB, Cordero D, Patarnello T, Cancela ML, Bargelloni L. A Microarray Study of Carpet-Shell Clam ( Ruditapes decussatus) Shows Common and Organ-Specific Growth-Related Gene Expression Differences in Gills and Digestive Gland. Front Physiol 2017; 8:943. [PMID: 29234285 PMCID: PMC5712350 DOI: 10.3389/fphys.2017.00943] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 11/07/2017] [Indexed: 01/04/2023] Open
Abstract
Growth rate is one of the most important traits from the point of view of individual fitness and commercial production in mollusks, but its molecular and physiological basis is poorly known. We have studied differential gene expression related to differences in growth rate in adult individuals of the commercial marine clam Ruditapes decussatus. Gene expression in the gills and the digestive gland was analyzed in 5 fast-growing and five slow-growing animals by means of an oligonucleotide microarray containing 14,003 probes. A total of 356 differentially expressed genes (DEG) were found. We tested the hypothesis that differential expression might be concentrated at the growth control gene core (GCGC), i.e., the set of genes that underlie the molecular mechanisms of genetic control of tissue and organ growth and body size, as demonstrated in model organisms. The GCGC includes the genes coding for enzymes of the insulin/insulin-like growth factor signaling pathway (IIS), enzymes of four additional signaling pathways (Raf/Ras/Mapk, Jnk, TOR, and Hippo), and transcription factors acting at the end of those pathways. Only two out of 97 GCGC genes present in the microarray showed differential expression, indicating a very little contribution of GCGC genes to growth-related differential gene expression. Forty eight DEGs were shared by both organs, with gene ontology (GO) annotations corresponding to transcription regulation, RNA splicing, sugar metabolism, protein catabolism, immunity, defense against pathogens, and fatty acid biosynthesis. GO term enrichment tests indicated that genes related to growth regulation, development and morphogenesis, extracellular matrix proteins, and proteolysis were overrepresented in the gills. In the digestive gland overrepresented GO terms referred to gene expression control through chromatin rearrangement, RAS-related small GTPases, glucolysis, and energy metabolism. These analyses suggest a relevant role of, among others, some genes related to the IIS, such as the ParaHox gene Xlox, CCAR and the CCN family of secreted proteins, in the regulation of growth in bivalves.
Collapse
Affiliation(s)
- Carlos Saavedra
- Instituto de Acuicultura Torre de la Sal, Consejo Superior de Investigaciones Científicas, Castelló de la Plana, Spain
| | - Massimo Milan
- Dipartimento di Biomedicina Comparata e Alimentazione, Universitá di Padova, Polo di Agripolis, Legnaro, Italy
| | - Ricardo B Leite
- Centre of Marine Sciences (CCMAR), Universidade do Algarve, Faro, Portugal
| | - David Cordero
- Instituto de Acuicultura Torre de la Sal, Consejo Superior de Investigaciones Científicas, Castelló de la Plana, Spain
| | - Tomaso Patarnello
- Dipartimento di Biomedicina Comparata e Alimentazione, Universitá di Padova, Polo di Agripolis, Legnaro, Italy
| | - M Leonor Cancela
- Centre of Marine Sciences (CCMAR), Universidade do Algarve, Faro, Portugal.,Department of Biomedical Sciences and Medicine and Academic Biomedical Centre, Universidade do Algarve, Faro, Portugal
| | - Luca Bargelloni
- Dipartimento di Biomedicina Comparata e Alimentazione, Universitá di Padova, Polo di Agripolis, Legnaro, Italy
| |
Collapse
|
8
|
Barragan S, Rueda C, Fernandez M. Circular Order Aggregation and its Application to Cell-cycle Genes Expressions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:819-829. [PMID: 27305684 DOI: 10.1109/tcbb.2016.2565469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The aim of circular order aggregation is to find a circular order on a set of n items using angular values from p heterogeneous data sets. This problem is new in the literature and has been motivated by the biological question of finding the order among the peak expression of a group of cell cycle genes. In this paper, two very different approaches to solve the problem that use pairwise and triplewise information are proposed. Both approaches are analyzed and compared using theoretical developments and numerical studies, and applied to the cell cycle data that motivated the problem.
Collapse
|
9
|
Tang M, Sun J, Shimizu K, Kadota K. Evaluation of methods for differential expression analysis on multi-group RNA-seq count data. BMC Bioinformatics 2015; 16:361. [PMID: 26538400 PMCID: PMC4634584 DOI: 10.1186/s12859-015-0794-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 10/24/2015] [Indexed: 11/22/2022] Open
Abstract
Background RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. Methods We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. Results As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7 % of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. Conclusion We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0794-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Min Tang
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
| | - Jianqiang Sun
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
| |
Collapse
|
10
|
Chen SC, Tsai TH, Chung CH, Li WH. Dynamic association rules for gene expression data analysis. BMC Genomics 2015; 16:786. [PMID: 26467206 PMCID: PMC4606551 DOI: 10.1186/s12864-015-1970-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 10/02/2015] [Indexed: 01/08/2023] Open
Abstract
Background The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. Results We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. Conclusions In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.
Collapse
Affiliation(s)
- Shu-Chuan Chen
- Department of Mathematics and Statistics, Idaho State University, Pocatello, ID, 83209, USA.
| | - Tsung-Hsien Tsai
- Department of Statistics, National Cheng-Kung University, Tainan, 701, Taiwan.
| | - Cheng-Han Chung
- Department of Biological Sciences, Idaho State University, Pocatello, ID, 83209, USA.
| | - Wen-Hsiung Li
- Academia Sinica, Taipei, 115, Taiwan. .,Department of Ecology and Evolution, University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
11
|
Nguyen T, Khosravi A, Creighton D, Nahavandi S. A novel aggregate gene selection method for microarray data classification. Pattern Recognit Lett 2015. [DOI: 10.1016/j.patrec.2015.03.018] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Kanduri C, Kuusi T, Ahvenainen M, Philips AK, Lähdesmäki H, Järvelä I. The effect of music performance on the transcriptome of professional musicians. Sci Rep 2015; 5:9506. [PMID: 25806429 PMCID: PMC5380155 DOI: 10.1038/srep09506] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 02/27/2015] [Indexed: 12/31/2022] Open
Abstract
Music performance by professional musicians involves a wide-spectrum of cognitive and multi-sensory motor skills, whose biological basis is unknown. Several neuroscientific studies have demonstrated that the brains of professional musicians and non-musicians differ structurally and functionally and that musical training enhances cognition. However, the molecules and molecular mechanisms involved in music performance remain largely unexplored. Here, we investigated the effect of music performance on the genome-wide peripheral blood transcriptome of professional musicians by analyzing the transcriptional responses after a 2-hr concert performance and after a 'music-free' control session. The up-regulated genes were found to affect dopaminergic neurotransmission, motor behavior, neuronal plasticity, and neurocognitive functions including learning and memory. Particularly, candidate genes such as SNCA, FOS and DUSP1 that are involved in song perception and production in songbirds, were identified, suggesting an evolutionary conservation in biological processes related to sound perception/production. Additionally, modulation of genes related to calcium ion homeostasis, iron ion homeostasis, glutathione metabolism, and several neuropsychiatric and neurodegenerative diseases implied that music performance may affect the biological pathways that are otherwise essential for the proper maintenance of neuronal function and survival. For the first time, this study provides evidence for the candidate genes and molecular mechanisms underlying music performance.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Department of Medical Genetics, Haartman Institute, University of Helsinki, P.O. Box 720, 00014 University of Helsinki, Finland
| | - Tuire Kuusi
- DocMus doctoral school, Sibelius Academy, University of the Arts, P.O. Box 30, FI 0097 Uniarts, Finland
| | - Minna Ahvenainen
- Department of Medical Genetics, Haartman Institute, University of Helsinki, P.O. Box 720, 00014 University of Helsinki, Finland
| | - Anju K. Philips
- Department of Medical Genetics, Haartman Institute, University of Helsinki, P.O. Box 720, 00014 University of Helsinki, Finland
| | - Harri Lähdesmäki
- Department of Information and Computer Science, Aalto University, FI-00076 AALTO, Finland
| | - Irma Järvelä
- Department of Medical Genetics, Haartman Institute, University of Helsinki, P.O. Box 720, 00014 University of Helsinki, Finland
| |
Collapse
|
13
|
Kanduri C, Raijas P, Ahvenainen M, Philips AK, Ukkola-Vuoti L, Lähdesmäki H, Järvelä I. The effect of listening to music on human transcriptome. PeerJ 2015; 3:e830. [PMID: 25789207 PMCID: PMC4362302 DOI: 10.7717/peerj.830] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 02/18/2015] [Indexed: 01/09/2023] Open
Abstract
Although brain imaging studies have demonstrated that listening to music alters human brain structure and function, the molecular mechanisms mediating those effects remain unknown. With the advent of genomics and bioinformatics approaches, these effects of music can now be studied in a more detailed fashion. To verify whether listening to classical music has any effect on human transcriptome, we performed genome-wide transcriptional profiling from the peripheral blood of participants after listening to classical music (n = 48), and after a control study without music exposure (n = 15). As musical experience is known to influence the responses to music, we compared the transcriptional responses of musically experienced and inexperienced participants separately with those of the controls. Comparisons were made based on two subphenotypes of musical experience: musical aptitude and music education. In musically experiencd participants, we observed the differential expression of 45 genes (27 up- and 18 down-regulated) and 97 genes (75 up- and 22 down-regulated) respectively based on subphenotype comparisons (rank product non-parametric statistics, pfp 0.05, >1.2-fold change over time across conditions). Gene ontological overrepresentation analysis (hypergeometric test, FDR < 0.05) revealed that the up-regulated genes are primarily known to be involved in the secretion and transport of dopamine, neuron projection, protein sumoylation, long-term potentiation and dephosphorylation. Down-regulated genes are known to be involved in ATP synthase-coupled proton transport, cytolysis, and positive regulation of caspase, peptidase and endopeptidase activities. One of the most up-regulated genes, alpha-synuclein (SNCA), is located in the best linkage region of musical aptitude on chromosome 4q22.1 and is regulated by GATA2, which is known to be associated with musical aptitude. Several genes reported to regulate song perception and production in songbirds displayed altered activities, suggesting a possible evolutionary conservation of sound perception between species. We observed no significant findings in musically inexperienced participants.
Collapse
Affiliation(s)
| | - Pirre Raijas
- DocMus Department, University of the Arts Helsinki , Helsinki , Finland
| | - Minna Ahvenainen
- Department of Medical Genetics, University of Helsinki , Finland
| | - Anju K Philips
- Department of Medical Genetics, University of Helsinki , Finland
| | | | - Harri Lähdesmäki
- Department of Information and Computer Science, Aalto University , AALTO , Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki , Finland
| |
Collapse
|
14
|
Simos T, Georgopoulou U, Thyphronitis G, Koskinas J, Papaloukas C. Analysis of protein interaction networks for the detection of candidate hepatitis B and C biomarkers. IEEE J Biomed Health Inform 2014; 19:181-9. [PMID: 25099894 DOI: 10.1109/jbhi.2014.2344732] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Hepatitis B virus (HBV) and hepatitis C virus (HCV) infection are the major causes of chronic liver disease, cirrhosis and hepatocellular carcinoma (HCC). The resolution or chronicity of acute infection is dependent on a complex interplay between virus and innate/adaptive immunity. The mechanisms that lead a significant proportion of patients to more severe liver disease are not clearly defined and involve virus induced host gene/protein alterations. The utilization of protein interaction networks (PINs) is expected to identify novel aspects of the disease concerning the patients' immune response to virus as well as the main pathways that are involved in the development of fibrosis and HCC. In this study, we designed several PINs for HBV and HCV and employed topological, modular, and functional analysis techniques in order to determine significant network nodes that correspond to prominent candidate biomarkers. The networks were built using data from various interaction databases. When the overall PINs of HBV and HCV were compared, 48 nodes were found in common. The implementation of a statistical ranking procedure indicated that three of them are of higher importance.
Collapse
|
15
|
Mitchell CL, Saul MC, Lei L, Wei H, Werner T. The mechanisms underlying α-amanitin resistance in Drosophila melanogaster: a microarray analysis. PLoS One 2014; 9:e93489. [PMID: 24695618 PMCID: PMC3973583 DOI: 10.1371/journal.pone.0093489] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 03/06/2014] [Indexed: 01/25/2023] Open
Abstract
The rapid evolution of toxin resistance in animals has important consequences for the ecology of species and our economy. Pesticide resistance in insects has been a subject of intensive study; however, very little is known about how Drosophila species became resistant to natural toxins with ecological relevance, such as α-amanitin that is produced in deadly poisonous mushrooms. Here we performed a microarray study to elucidate the genes, chromosomal loci, molecular functions, biological processes, and cellular components that contribute to the α-amanitin resistance phenotype in Drosophila melanogaster. We suggest that toxin entry blockage through the cuticle, phase I and II detoxification, sequestration in lipid particles, and proteolytic cleavage of α-amanitin contribute in concert to this quantitative trait. We speculate that the resistance to mushroom toxins in D. melanogaster and perhaps in mycophagous Drosophila species has evolved as cross-resistance to pesticides, other xenobiotic substances, or environmental stress factors.
Collapse
Affiliation(s)
- Chelsea L. Mitchell
- Department of Biological Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Michael C. Saul
- Department of Zoology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Liang Lei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan, United States of America
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan, United States of America
| | - Thomas Werner
- Department of Biological Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
16
|
Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics 2014; 15:14. [PMID: 24423217 PMCID: PMC3899927 DOI: 10.1186/1471-2105-15-14] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 12/27/2013] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Different methods have been proposed for analyzing differentially expressed (DE) genes in microarray data. Methods based on statistical tests that incorporate expression level variability are used more commonly than those based on fold change (FC). However, FC based results are more reproducible and biologically relevant. RESULTS We propose a new method based on fold change rank ordering statistics (FCROS). We exploit the variation in calculated FC levels using combinatorial pairs of biological conditions in the datasets. A statistic is associated with the ranks of the FC values for each gene, and the resulting probability is used to identify the DE genes within an error level. The FCROS method is deterministic, requires a low computational runtime and also solves the problem of multiple tests which usually arises with microarray datasets. CONCLUSION We compared the performance of FCROS with those of other methods using synthetic and real microarray datasets. We found that FCROS is well suited for DE gene identification from noisy datasets when compared with existing FC based methods.
Collapse
Affiliation(s)
- Doulaye Dembélé
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U964, CNRS UMR 7104, Université de Strasbourg, 67404 Illkirch, France.
| | | |
Collapse
|
17
|
Rao SSS, Shepherd LA, Bruno AE, Liu S, Miecznikowski JC. Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets. Adv Bioinformatics 2013; 2013:790567. [PMID: 24223587 PMCID: PMC3809938 DOI: 10.1155/2013/790567] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Accepted: 08/28/2013] [Indexed: 01/13/2023] Open
Abstract
Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.
Collapse
Affiliation(s)
| | - Lori A. Shepherd
- Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Andrew E. Bruno
- Center for Computational Research, University at Buffalo, NYS Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA
| | - Song Liu
- Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| | - Jeffrey C. Miecznikowski
- Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
- Department of Biostatistics, SUNY University at Buffalo, Buffalo, NY 14214, USA
| |
Collapse
|
18
|
Tsuyuzaki K, Tominaga D, Kwon Y, Miyazaki S. Two-way AIC: detection of differentially expressed genes from large scale microarray meta-dataset. BMC Genomics 2013; 14 Suppl 2:S9. [PMID: 23445621 PMCID: PMC3582450 DOI: 10.1186/1471-2164-14-s2-s9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Detection of significant differentially expressed genes (DEGs) from DNA microarray datasets is a common routine task conducted in biomedical research. For the detection of DEGs, numerous methods are proposed. By such conventional methods, generally, DEGs are detected from one dataset consisting of group of control and treatment. However, some DEGs are easily to be detected in any experimental condition. For the detection of much experiment condition specific DEGs, each measurement value of gene expression levels should be compared in two dimensional ways, or both with other genes and other datasets simultaneously. For this purpose, we retrieve the gene expression data from public database as possible and construct "meta-dataset" which summarize expression change of all genes in various experimental condition. Herein, we propose "two-way AIC" (Akaike Information Criteria), method for simultaneous detection of significance genes and experiments on meta-dataset. Results As a case study of the Pseudomonas aeruginosa, we evaluate whether two-way AIC method can detect test data which is the experiment condition specific DEGs. Operon genes are used as test data. Compared with other commonly used statistical methods (t-rank/F-test, RankProducts and SAM), two-way AIC shows the highest specificity of detection of operon genes. Conclusions The two-way AIC performs high specificity for operon gene detection on the microarray meta-dataset. This method can also be applied to estimation of mutual gene interactions.
Collapse
Affiliation(s)
- Koki Tsuyuzaki
- Department of Medical and Life Science, Faculty of Pharmaceutical Science, Tokyo University of Science, 2641 Yamazaki, Noda, 278-8510, Japan.
| | | | | | | |
Collapse
|
19
|
Zhang Y, Baker SS, Baker RD, Zhu R, Zhu L. Systematic analysis of the gene expression in the livers of nonalcoholic steatohepatitis: implications on potential biomarkers and molecular pathological mechanism. PLoS One 2012; 7:e51131. [PMID: 23300535 PMCID: PMC3530598 DOI: 10.1371/journal.pone.0051131] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 10/31/2012] [Indexed: 02/07/2023] Open
Abstract
Non-alcoholic steatohepatitis (NASH) is a severe form of non-alcoholic fatty liver disease (NAFLD). The molecular pathological mechanism of NASH is poorly understood. Recently, high throughput data such as microarray data together with bioinformatics methods have become a powerful way to identify biomarkers and to investigate pathogenesis of diseases. Taking advantage of well characterized microarray datasets of NASH livers, we performed a systematic analysis of potential biomarkers and possible pathological mechanism of NASH from a bioinformatics perspective.CodeLink Human Whole Genome Bioarrays were analyzed to find differentially expressed genes (DEGs) between controls and NASH patients. Four methods were used to identify DEGs and the intersection of DEGs identified by these methods was subsequently used for both biomarker prediction and molecular pathological mechanism analysis. For biomarker prediction, rank aggregation was used to rank DEGs identified by all these methods according to their significance of different expression. Alcohol dehydrogenase 4 (ADH4) exhibited the highest rank suggesting the most significant differential expression between normal and disease condition. Together with the previous report demonstrating the association between ADH4 and the pathogenesis of NASH, our data suggest that ADH4 could be a potential biomarker for NASH. For molecular pathological mechanism analysis, two clusters of highly correlated annotation terms and genes in these terms were identified based on the intersection of DEGs. Then, pathways enriched with these genes were identified to construct the network. Using this network, both for the first time, amino acid catabolism is implicated to play a pivotal role and urea cycle is implicated to be involved in the development of NASH.The results of our study identified potential biomarkers and suggested possible molecular pathological mechanism of NASH. These findings provide a comprehensive and systematic understanding of the pathogenesis of NASH and may facilitate the diagnosis, prevention and treatment of NASH.
Collapse
Affiliation(s)
- Yida Zhang
- Department of Bioinformatics, Tongji University, Shanghai, P.R. China
| | - Susan S. Baker
- Digestive Diseases and Nutrition Center, Department of Pediatrics, the State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Robert D. Baker
- Digestive Diseases and Nutrition Center, Department of Pediatrics, the State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Ruixin Zhu
- Department of Bioinformatics, Tongji University, Shanghai, P.R. China
| | - Lixin Zhu
- Digestive Diseases and Nutrition Center, Department of Pediatrics, the State University of New York at Buffalo, Buffalo, New York, United States of America
| |
Collapse
|
20
|
Phan JH, Quo CF, Wang MD. Cardiovascular genomics: a biomarker identification pipeline. ACTA ACUST UNITED AC 2012; 16:809-22. [PMID: 22614726 DOI: 10.1109/titb.2012.2199570] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Genomic biomarkers are essential for understanding the underlying molecular basis of human diseases such as cardiovascular disease. In this review, we describe a biomarker identification pipeline for cardiovascular disease, which includes 1) high-throughput genomic data acquisition, 2) preprocessing and normalization of data, 3) exploratory analysis, 4) feature selection, 5) classification, and 6) interpretation and validation of candidate biomarkers. We review each step in the pipeline, presenting current and widely used bioinformatics methods. Furthermore, we analyze several publicly available cardiovascular genomics datasets to illustrate the pipeline. Finally, we summarize the current challenges and opportunities for further research.
Collapse
Affiliation(s)
- John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
| | | | | |
Collapse
|
21
|
Kadota K, Nishiyama T, Shimizu K. A normalization strategy for comparing tag count data. Algorithms Mol Biol 2012; 7:5. [PMID: 22475125 PMCID: PMC3341196 DOI: 10.1186/1748-7188-7-5] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Accepted: 04/05/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. RESULTS We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. CONCLUSION Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.
Collapse
|