1
|
Li D, Cheng W, Zhou X, Zheng X, Ren J, Meng T. Insight into the role of stress response and toxic mechanism induced by Chloro-haloacetonitrile in vitro. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 284:116999. [PMID: 39244879 DOI: 10.1016/j.ecoenv.2024.116999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 08/29/2024] [Accepted: 09/01/2024] [Indexed: 09/10/2024]
Abstract
Chloro-haloacetonitrile (Cl-HAN), belongs to a group of nitrogenous disinfection by-products (N-DBPs) found in surface water, and are known to pose a major risk to the safety of human drinking water. However, the exact biological toxicity mechanism and the extent of the stress response caused by Cl-HAN remain unclear, resulting in a lack of effective measures to control its presence. Thus, the quantitative toxicological genomics and bioinformatics methods were applied to explore the effects of three chloro-haloacetonitriles (Cl-HANs) on the transcription of fusion genes under varying concentrations of stress in E. coli over 2-hour period. The initial stress response and their toxic mechanism were analyzed. The study also identified the molecular toxicity endpoint, and the core genes that are responsible for the specific toxicity of different Cl-HANs. Cl-HANs exhibited concentration-dependent characteristics of toxic effects, and caused changes in gene expression related oxidative and membrane stress. The stress response results showed that dichloroacetonitrile (dCAN) still caused significant DNA damage under the lowest concentration stress. Chloroacetonitrile (CAN) and trichloroacetonitrile (tCAN) exhibited lower genetic toxicity levels at 513 μg/L and 10.7 μg/L, respectively. The toxic effects of tCAN were widespread. And there was a good correlation between the molecular endpoint (EC-TELI1.5) and the phenotypic endpoint (LD50) with rp=-0.8634 (P=0.0593). In all concentrations of stress in CAN, dCAN, and tCAN, the number of overexpressed genes shared was 15, 2, and 14, respectively. Furthermore, bioinformatics analysis demonstrated that Cl-HANs affected genes associated with general stress pathways, such as cell biochemistry and physical homeostasis, resulting in changes in biological processes. And for CAN-induced DNA damage, polA played a dominant role, while katG, oxyR, and ahpC were the core genes involved in oxidative stress induced by dCAN and tCAN, respectively. These findings provide valuable data for the toxic effect of Cl-HANs.
Collapse
Affiliation(s)
- Dong Li
- Department of Municipal and Environmental Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China
| | - Wen Cheng
- Department of Municipal and Environmental Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China; State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China
| | - Xiaoping Zhou
- Power China Northwest Engineering Corporation Limited, Xi'an, Shaanxi 710065, PR China
| | - Xing Zheng
- Department of Municipal and Environmental Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China
| | - Jiehui Ren
- Department of Municipal and Environmental Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China.
| | - Ting Meng
- Department of Municipal and Environmental Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, PR China
| |
Collapse
|
2
|
Giakoumaki M, Lambrou GI, Vlachodimitropoulos D, Tagka A, Vitsos A, Kyriazi M, Dimakopoulou A, Anagnostou V, Karasmani M, Deli H, Grigoropoulos A, Karalis E, Rallis MC, Black HS. Type I Diabetes Mellitus Suppresses Experimental Skin Carcinogenesis. Cancers (Basel) 2024; 16:1507. [PMID: 38672589 PMCID: PMC11048394 DOI: 10.3390/cancers16081507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 04/07/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
This study explores the previously uncharted territory of the effects of ultraviolet (UV) radiation on diabetic skin, compared to its well-documented impact on normal skin, particularly focusing on carcinogenesis and aging. Employing hairless SKH-hr2, Type 1 and 2 diabetic, and nondiabetic male mice, the research subjected these to UV radiation thrice weekly for eight months. The investigation included comprehensive assessments of photoaging and photocarcinogenesis in diabetic versus normal skin, measuring factors such as hydration, trans-epidermal water loss, elasticity, skin thickness, melanin, sebum content, stratum corneum exfoliation and body weight, alongside photo documentation. Additionally, oxidative stress and the presence of hydrophilic antioxidants (uric acid and glutathione) in the stratum corneum were evaluated. Histopathological examination post-sacrifice provided insights into the morphological changes. Findings reveal that under UV exposure, Type 1 diabetic skin showed heightened dehydration, thinning, and signs of accelerated aging. Remarkably, Type 1 diabetic mice did not develop squamous cell carcinoma or pigmented nevi, contrary to normal and Type 2 diabetic skin. This unexpected resistance to UV-induced skin cancers in Type 1 diabetic skin prompts a crucial need for further research to uncover the underlying mechanisms providing this resistance.
Collapse
Affiliation(s)
- Maria Giakoumaki
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - George I. Lambrou
- Choremeio Research Laboratory, First Department of Pediatrics, School of Health Sciences, Medical School, National and Kapodistrian University of Athens, Thivon & Levaeias 8, Goudi, 11527 Athens, Greece;
- Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, Thivon & Levadeias 8, 11527 Athens, Greece
| | - Dimitrios Vlachodimitropoulos
- Department of Forensic Medicine and Toxicology, Medical School, National and Kapodistrian University of Athens, 75, Mikras Asias Street, 11527 Athens, Greece;
| | - Anna Tagka
- First Department of Dermatology and Venereology, ‘Andreas Syggros” Hospital, School of Medicine, National and Kapodistrian University of Athens, Ionos Dragoumi 5, 11621 Athens, Greece;
| | - Andreas Vitsos
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Maria Kyriazi
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Aggeliki Dimakopoulou
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Vasiliki Anagnostou
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Marina Karasmani
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Heleni Deli
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Andreas Grigoropoulos
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Evangelos Karalis
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Michail Christou Rallis
- Division of Pharmaceutical Technology, Department of Pharmacy, School of Health Sciences, National and Kapodistrian University of Athens, Panepistimiopolis, 15784 Athens, Greece; (M.G.); (A.V.); (M.K.); (A.D.); (V.A.); (M.K.); (H.D.); (A.G.); (E.K.)
| | - Homer S. Black
- Department of Dermatology, Baylor College of Medicine, Houston, TX 77030, USA;
| |
Collapse
|
3
|
Li D, Cheng W, Ren J, Qin L, Zheng X, Wan T, Wang M. In vitro toxicity assessment of haloacetamides via a toxicogenomics assay. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2023; 97:104026. [PMID: 36455839 DOI: 10.1016/j.etap.2022.104026] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/22/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
It is important to study the stress effects and mechanisms of haloacetamide (HAcAm) disinfection byproducts to reveal their health hazards. In this context, toxicological g was applied to evaluate the effects of four HAcAms, revealing the status of gene expression on Escherichia coli in different stress response types (oxidative, protein, membrane, general, DNA). This study revealed that the main toxic action modes of these HAcAms were general and membrane stresses by high-resolution, real-time gene expression profiling combined with clustering analysis. The results of time-gene evaluation showed that the presence of chloroacetamide (CAcAm) and bromoacetamide (BAcAm) generated more reactive oxygen species, thus activating oxidative stress. Trichloroacetamide (tCAcAm) induced altered expression of glutathione marker genes and membrane stress-related genes, and iodoacetamide (IAcAm) caused severe DNA damage by damaging DNA strands and individual nucleotides mainly through damage to nucleic acids and bases. Furthermore, quantitative structure-activity relationship (QSAR) modelling results indicated that the biological activities of HAcAms were related to their quantum chemical and topological properties.
Collapse
Affiliation(s)
- Dong Li
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| | - Wen Cheng
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China.
| | - Jiehui Ren
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| | - Lu Qin
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| | - Xing Zheng
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| | - Tian Wan
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| | - Min Wang
- State Key Laboratory of Eco-Hydraulics in Northwest Arid Region of China, Xi'an University of Technology, Xi'an 710048, China
| |
Collapse
|
4
|
Aghaieabiane N, Koutis I. A Novel Calibration Step in Gene Co-Expression Network Construction. FRONTIERS IN BIOINFORMATICS 2021; 1:704817. [PMID: 36303738 PMCID: PMC9581019 DOI: 10.3389/fbinf.2021.704817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022] Open
Abstract
High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p-values of the gene ontology term enrichment of the computed modules.
Collapse
|
5
|
Lambrou GI, Poulou M, Giannikou K, Themistocleous M, Zaravinos A, Braoudaki M. Differential and Common Signatures of miRNA Expression and Methylation in Childhood Central Nervous System Malignancies: An Experimental and Computational Approach. Cancers (Basel) 2021; 13:cancers13215491. [PMID: 34771655 PMCID: PMC8583574 DOI: 10.3390/cancers13215491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 11/16/2022] Open
Abstract
Epigenetic modifications are considered of utmost significance for tumor ontogenesis and progression. Especially, it has been found that miRNA expression, as well as DNA methylation plays a significant role in central nervous system tumors during childhood. A total of 49 resected brain tumors from children were used for further analysis. DNA methylation was identified with methylation-specific MLPA and, in particular, for the tumor suppressor genes CASP8, RASSF1, MGMT, MSH6, GATA5, ATM1, TP53, and CADM1. miRNAs were identified with microarray screening, as well as selected samples, were tested for their mRNA expression levels. CASP8, RASSF1 were the most frequently methylated genes in all tumor samples. Simultaneous methylation of genes manifested significant results with respect to tumor staging, tumor type, and the differentiation of tumor and control samples. There was no significant dependence observed with the methylation of one gene promoter, rather with the simultaneous presence of all detected methylated genes' promoters. miRNA expression was found to be correlated to gene methylation. Epigenetic regulation appears to be of major importance in tumor progression and pathophysiology, making it an imperative field of study.
Collapse
Affiliation(s)
- George I. Lambrou
- Choremeio Research Laboratory, First Department of Pediatrics, National and Kapodistrian University of Athens, 11527 Athens, Greece;
| | - Myrto Poulou
- Department of Medical Genetics, Medical School, National and Kapodistrian University of Athens, 15772 Athens, Greece;
| | - Krinio Giannikou
- Cancer Genetics Laboratory, Division of Pulmonary and Critical Care Medicine and of Genetics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA;
| | - Marios Themistocleous
- Department of Neurosurgery, “Aghia Sofia” Children’s Hospital, 11527 Athens, Greece;
| | - Apostolos Zaravinos
- Department of Life Sciences, School of Sciences, European University Cyprus, Nicosia 2404, Cyprus
- Basic and Translational Cancer Research Center (BTCRC), Cancer Genetics, Genomics and Systems Biology Group, European University Cyprus, Nicosia 1516, Cyprus
- Correspondence: (A.Z.); (M.B.)
| | - Maria Braoudaki
- Department of Life and Environmental Sciences, School of Life and Health Sciences, University of Hertfordshire, Hertfordshire AL10 9AB, UK
- Correspondence: (A.Z.); (M.B.)
| |
Collapse
|
6
|
Rahmatbakhsh M, Gagarinova A, Babu M. Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections. Front Genet 2021; 12:667936. [PMID: 34276775 PMCID: PMC8283032 DOI: 10.3389/fgene.2021.667936] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/08/2021] [Indexed: 12/13/2022] Open
Abstract
Microbial pathogens have evolved numerous mechanisms to hijack host's systems, thus causing disease. This is mediated by alterations in the combined host-pathogen proteome in time and space. Mass spectrometry-based proteomics approaches have been developed and tailored to map disease progression. The result is complex multidimensional data that pose numerous analytic challenges for downstream interpretation. However, a systematic review of approaches for the downstream analysis of such data has been lacking in the field. In this review, we detail the steps of a typical temporal and spatial analysis, including data pre-processing steps (i.e., quality control, data normalization, the imputation of missing values, and dimensionality reduction), different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. We also discuss current best practices for these steps based on a collection of independent studies to guide users in selecting the most suitable strategies for their dataset and analysis objectives. Moreover, we also compiled the list of commonly used R software packages for each step of the analysis. These could be easily integrated into one's analysis pipeline. Furthermore, we guide readers through various analysis steps by applying these workflows to mock and host-pathogen interaction data from public datasets. The workflows presented in this review will serve as an introduction for data analysis novices, while also helping established users update their data analysis pipelines. We conclude the review by discussing future directions and developments in temporal and spatial proteomics and data analysis approaches. Data analysis codes, prepared for this review are available from https://github.com/BabuLab-UofR/TempSpac, where guidelines and sample datasets are also offered for testing purposes.
Collapse
Affiliation(s)
| | - Alla Gagarinova
- Department of Biochemistry, Microbiology, & Immunology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, SK, Canada
| |
Collapse
|
7
|
Lambrou GI, Zaravinos A, Braoudaki M. Co-Deregulated miRNA Signatures in Childhood Central Nervous System Tumors: In Search for Common Tumor miRNA-Related Mechanics. Cancers (Basel) 2021; 13:cancers13123028. [PMID: 34204289 PMCID: PMC8235499 DOI: 10.3390/cancers13123028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/09/2021] [Accepted: 06/14/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary Childhood tumors of the central nervous system (CNS) constitute a grave disease and their diagnosis is difficult to be handled. To gain better knowledge of the tumor’s biology, it is essential to understand the underlying mechanisms of the disease. MicroRNAs (miRNAs) are small noncoding RNAs that are dysregulated in many types of CNS tumors and regulate their occurrence and development through specific signal pathways. However, different types of CNS tumors’ area are characterized by different deregulated miRNAs. Here, we hypothesized that CNS tumors could have commonly deregulated miRNAs, i.e., miRNAs that are simultaneously either upregulated or downregulated in all tumor types compared to the normal brain tissue, irrespectively of the tumor sub-type and/or diagnosis. The only criterion is that they are present in brain tumors. This approach could lead us to the discovery of miRNAs that could be used as pan-CNS tumoral therapeutic targets, if successful. Abstract Despite extensive experimentation on pediatric tumors of the central nervous system (CNS), related to both prognosis, diagnosis and treatment, the understanding of pathogenesis and etiology of the disease remains scarce. MicroRNAs are known to be involved in CNS tumor oncogenesis. We hypothesized that CNS tumors possess commonly deregulated miRNAs across different CNS tumor types. Aim: The current study aims to reveal the co-deregulated miRNAs across different types of pediatric CNS tumors. Materials: A total of 439 CNS tumor samples were collected from both in-house microarray experiments as well as data available in public databases. Diagnoses included medulloblastoma, astrocytoma, ependydoma, cortical dysplasia, glioblastoma, ATRT, germinoma, teratoma, yoc sac tumors, ocular tumors and retinoblastoma. Results: We found miRNAs that were globally up- or down-regulated in the majority of the CNS tumor samples. MiR-376B and miR-372 were co-upregulated, whereas miR-149, miR-214, miR-574, miR-595 and miR-765 among others, were co-downregulated across all CNS tumors. Receiver-operator curve analysis showed that miR-149, miR-214, miR-574, miR-595 and miR765 could distinguish between CNS tumors and normal brain tissue. Conclusions: Our approach could prove significant in the search for global miRNA targets for tumor diagnosis and therapy. To the best of our knowledge, there are no previous reports concerning the present approach.
Collapse
Affiliation(s)
- George I. Lambrou
- Choremeio Research Laboratory, First Department of Pediatrics, National and Kapodistrian University of Athens, Thivon & Levadeias 8, Goudi, 11527 Athens, Greece;
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology Group, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
- Correspondence: (A.Z.); (M.B.); Tel.: +974-4403-7819 (A.Z.); +44-(0)-1707286503 (ext. 3503) (M.B.)
| | - Maria Braoudaki
- Department of Clinical, Pharmaceutical and Biological Science, School of Life and Medical Sciences, University of Hertfordshire, College Lane, Hatfield AL10 9AB, Hertfordshire, UK
- Correspondence: (A.Z.); (M.B.); Tel.: +974-4403-7819 (A.Z.); +44-(0)-1707286503 (ext. 3503) (M.B.)
| |
Collapse
|
8
|
Ogbede JU, Giaever G, Nislow C. A genome-wide portrait of pervasive drug contaminants. Sci Rep 2021; 11:12487. [PMID: 34127714 PMCID: PMC8203678 DOI: 10.1038/s41598-021-91792-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/25/2021] [Indexed: 11/08/2022] Open
Abstract
Using a validated yeast chemogenomic platform, we characterized the genome-wide effects of several pharmaceutical contaminants, including three N-nitrosamines (NDMA, NDEA and NMBA), two related compounds (DMF and 4NQO) and several of their metabolites. A collection of 4800 non-essential homozygous diploid yeast deletion strains were screened in parallel and the strain abundance was quantified by barcode sequencing. These data were used to rank deletion strains representing genes required for resistance to the compounds to delineate affected cellular pathways and to visualize the global cellular effects of these toxins in an easy-to-use searchable database. Our analysis of the N-nitrosamine screens uncovered genes (via their corresponding homozygous deletion mutants) involved in several evolutionarily conserved pathways, including: arginine biosynthesis, mitochondrial genome integrity, vacuolar protein sorting and DNA damage repair. To investigate why NDMA, NDEA and DMF caused fitness defects in strains lacking genes of the arginine pathway, we tested several N-nitrosamine metabolites (methylamine, ethylamine and formamide), and found they also affected arginine pathway mutants. Notably, each of these metabolites has the potential to produce ammonium ions during their biotransformation. We directly tested the role of ammonium ions in N-nitrosamine toxicity by treatment with ammonium sulfate and we found that ammonium sulfate also caused a growth defect in arginine pathway deletion strains. Formaldehyde, a metabolite produced from NDMA, methylamine and formamide, and which is known to cross-link free amines, perturbed deletion strains involved in chromatin remodeling and DNA repair pathways. Finally, co-administration of N-nitrosamines with ascorbic or ferulic acid did not relieve N-nitrosamine toxicity. In conclusion, we used parallel deletion mutant analysis to characterize the genes and pathways affected by exposure to N-nitrosamines and related compounds, and provide the data in an accessible, queryable database.
Collapse
Affiliation(s)
- Joseph Uche Ogbede
- Genome Science & Technology Graduate Program, University of British Columbia, Vancouver, Canada
| | - Guri Giaever
- Faculty of Pharmaceutical Science, University of British Columbia, Vancouver, Canada
| | - Corey Nislow
- Genome Science & Technology Graduate Program, University of British Columbia, Vancouver, Canada.
- Faculty of Pharmaceutical Science, University of British Columbia, Vancouver, Canada.
| |
Collapse
|
9
|
Cauduro GP, Leal AL, Lopes TF, Marmitt M, Valiati VH. Differential Expression and PAH Degradation: What Burkholderia vietnamiensis G4 Can Tell Us? Int J Microbiol 2020; 2020:8831331. [PMID: 32908529 PMCID: PMC7474390 DOI: 10.1155/2020/8831331] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 07/04/2020] [Accepted: 07/31/2020] [Indexed: 11/17/2022] Open
Abstract
Petroleum is the major energy matrix in the world whose refining generates chemical byproducts that may damage the environment. Among such waste, polycyclic aromatic hydrocarbons (PAH) are considered persistent pollutants. Sixteen of these are considered priority for remediation, and among them is benzo(a)pyrene. Amid remediation techniques, bioremediation stands out. The genus Burkholderia is amongst the microorganisms known for being capable of degrading persistent compounds; its strains are used as models to study such ability. High-throughput sequencing allows researchers to reach a wider knowledge about biodegradation by bacteria. Using transcripts and mRNA analysis, the genomic regions involved in this aptitude can be detected. To unravel these processes, we used the model B. vietnamiensis strain G4 in two experimental groups: one was exposed to benzo(a)pyrene and the other one (control) was not. Six transcriptomes were generated from each group aiming to compare gene expression and infer which genes are involved in degradation pathways. One hundred fifty-six genes were differentially expressed in the benzo(a)pyrene exposed group, from which 33% are involved in catalytic activity. Among these, the most significant genomic regions were phenylacetic acid degradation protein paaN, involved in the degradation of organic compounds to obtain energy; oxidoreductase FAD-binding subunit, related to the regulation of electrons within groups of dioxygenase enzymes with potential to cleave benzene rings; and dehydrogenase, described as accountable for phenol degradation. These data provide the basis for understanding the bioremediation of benzo(a)pyrene and the possible applications of this strain in polluted environments.
Collapse
Affiliation(s)
| | - Ana Lusia Leal
- Companhia Riograndense de Saneamento, Biology Laboratory, Triunfo, RS, Brazil
| | - Tiago Falcón Lopes
- Centro de Terapia Gênica, Centro de Pesquisa Experimental, Hospital de Clínicas, Porto Alegre, RS, Brazil
| | - Marcela Marmitt
- Universidade do Vale do Rio dos Sinos, Biology Graduate Program, São Leopoldo, RS, Brazil
| | - Victor Hugo Valiati
- Universidade do Vale do Rio dos Sinos, Biology Graduate Program, São Leopoldo, RS, Brazil
| |
Collapse
|
10
|
Trofimov A, Cohen JP, Bengio Y, Perreault C, Lemieux S. Factorized embeddings learns rich and biologically meaningful embedding spaces using factorized tensor decomposition. Bioinformatics 2020; 36:i417-i426. [PMID: 32657403 PMCID: PMC7355243 DOI: 10.1093/bioinformatics/btaa488] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION The recent development of sequencing technologies revolutionized our understanding of the inner workings of the cell as well as the way disease is treated. A single RNA sequencing (RNA-Seq) experiment, however, measures tens of thousands of parameters simultaneously. While the results are information rich, data analysis provides a challenge. Dimensionality reduction methods help with this task by extracting patterns from the data by compressing it into compact vector representations. RESULTS We present the factorized embeddings (FE) model, a self-supervised deep learning algorithm that learns simultaneously, by tensor factorization, gene and sample representation spaces. We ran the model on RNA-Seq data from two large-scale cohorts and observed that the sample representation captures information on single gene and global gene expression patterns. Moreover, we found that the gene representation space was organized such that tissue-specific genes, highly correlated genes as well as genes participating in the same GO terms were grouped. Finally, we compared the vector representation of samples learned by the FE model to other similar models on 49 regression tasks. We report that the representations trained with FE rank first or second in all of the tasks, surpassing, sometimes by a considerable margin, other representations. AVAILABILITY AND IMPLEMENTATION A toy example in the form of a Jupyter Notebook as well as the code and trained embeddings for this project can be found at: https://github.com/TrofimovAssya/FactorizedEmbeddings. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Assya Trofimov
- Department of Computer Science, Univerity of Montreal, Québec, Canada
- Institute for Research in Immunology and Cancer, Univerity of Montreal, Québec, Canada
- Mila, Univerity of Montreal, Québec, Canada
| | - Joseph Paul Cohen
- Department of Computer Science, Univerity of Montreal, Québec, Canada
- Mila, Univerity of Montreal, Québec, Canada
| | - Yoshua Bengio
- Department of Computer Science, Univerity of Montreal, Québec, Canada
- Mila, Univerity of Montreal, Québec, Canada
| | - Claude Perreault
- Institute for Research in Immunology and Cancer, Univerity of Montreal, Québec, Canada
- Department of Medicine, Univerity of Montreal, Québec, Canada
| | - Sébastien Lemieux
- Department of Computer Science, Univerity of Montreal, Québec, Canada
- Institute for Research in Immunology and Cancer, Univerity of Montreal, Québec, Canada
- Department of Biochemistry and Molecular Medicine, Univerity of Montreal, Québec, Canada
| |
Collapse
|
11
|
Chandereng T, Gitter A. Lag penalized weighted correlation for time series clustering. BMC Bioinformatics 2020; 21:21. [PMID: 31948388 PMCID: PMC6966853 DOI: 10.1186/s12859-019-3324-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 12/16/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. RESULTS We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. CONCLUSIONS LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.
Collapse
Affiliation(s)
- Thevaa Chandereng
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI USA
- Morgridge Institute of Research, Madison, WI USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI USA
- Morgridge Institute of Research, Madison, WI USA
| |
Collapse
|
12
|
Cole E, Gillespie S, Vulliamy P, Brohi K. Multiple organ dysfunction after trauma. Br J Surg 2019; 107:402-412. [PMID: 31691956 PMCID: PMC7078999 DOI: 10.1002/bjs.11361] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 07/02/2019] [Accepted: 08/13/2019] [Indexed: 01/31/2023]
Abstract
Background The nature of multiple organ dysfunction syndrome (MODS) after traumatic injury is evolving as resuscitation practices advance and more patients survive their injuries to reach critical care. The aim of this study was to characterize contemporary MODS subtypes in trauma critical care at a population level. Methods Adult patients admitted to major trauma centre critical care units were enrolled in this 4‐week point‐prevalence study. MODS was defined by a daily total Sequential Organ Failure Assessment (SOFA) score of more than 5. Hierarchical clustering of SOFA scores over time was used to identify MODS subtypes. Results Some 440 patients were enrolled, of whom 245 (55·7 per cent) developed MODS. MODS carried a high mortality rate (22·0 per cent versus 0·5 per cent in those without MODS; P < 0·001) and 24·0 per cent of deaths occurred within the first 48 h after injury. Three patterns of MODS were identified, all present on admission. Cluster 1 MODS resolved early with a median time to recovery of 4 days and a mortality rate of 14·4 per cent. Cluster 2 had a delayed recovery (median 13 days) and a mortality rate of 35 per cent. Cluster 3 had a prolonged recovery (median 25 days) and high associated mortality rate of 46 per cent. Multivariable analysis revealed distinct clinical associations for each form of MODS; 24‐hour crystalloid administration was associated strongly with cluster 1 (P = 0·009), traumatic brain injury with cluster 2 (P = 0·002) and admission shock severity with cluster 3 (P = 0·003). Conclusion Contemporary MODS has at least three distinct types based on patterns of severity and recovery. Further characterization of MODS subtypes and their underlying pathophysiology may lead to future opportunities for early stratification and targeted interventions.
Collapse
Affiliation(s)
- E Cole
- Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | - S Gillespie
- Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | - P Vulliamy
- Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | - K Brohi
- Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, 4 Newark Street, London, E1 2AT, UK
| | | |
Collapse
|
13
|
Barido-Sottani J, Chapman SD, Kosman E, Mushegian AR. Measuring similarity between gene interaction profiles. BMC Bioinformatics 2019; 20:435. [PMID: 31438841 PMCID: PMC6704681 DOI: 10.1186/s12859-019-3024-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 08/09/2019] [Indexed: 11/14/2022] Open
Abstract
Background Gene and protein interaction data are often represented as interaction networks, where nodes stand for genes or gene products and each edge stands for a relationship between a pair of gene nodes. Commonly, that relationship within a pair is specified by high similarity between profiles (vectors) of experimentally defined interactions of each of the two genes with all other genes in the genome; only gene pairs that interact with similar sets of genes are linked by an edge in the network. The tight groups of genes/gene products that work together in a cell can be discovered by the analysis of those complex networks. Results We show that the choice of the similarity measure between pairs of gene vectors impacts the properties of networks and of gene modules detected within them. We re-analyzed well-studied data on yeast genetic interactions, constructed four genetic networks using four different similarity measures, and detected gene modules in each network using the same algorithm. The four networks induced different numbers of putative functional gene modules, and each similarity measure induced some unique modules. In an example of a putative functional connection suggested by comparing genetic interaction vectors, we predict a link between SUN-domain proteins and protein glycosylation in the endoplasmic reticulum. Conclusions The discovery of molecular modules in genetic networks is sensitive to the way of measuring similarity between profiles of gene interactions in a cell. In the absence of a formal way to choose the “best” measure, it is advisable to explore the measures with different mathematical properties, which may identify different sets of connections between genes. Electronic supplementary material The online version of this article (10.1186/s12859-019-3024-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joëlle Barido-Sottani
- Stowers Institute for Medical Research, Kansas City, MO, USA.,École Polytechnique, Route de Saclay, Palaiseau, France.,Present Address: Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa, USA
| | - Samuel D Chapman
- Stowers Institute for Medical Research, Kansas City, MO, USA.,Present Address: Booz Allen Hamilton, McLean, Virginia, USA
| | - Evsey Kosman
- Institute for Cereal Crops Improvement, School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Arcady R Mushegian
- Stowers Institute for Medical Research, Kansas City, MO, USA. .,Department of Microbiology, Molecular Genetics and Immunology, Kansas University Medical Center, Kansas City, Kansas, USA. .,Present Address: Division of Molecular and Cellular Biosciences, National Science Foundation, Alexandria, Virginia, USA.
| |
Collapse
|
14
|
Moody L, Mantha S, Chen H, Pan YX. Computational methods to identify bimodal gene expression and facilitate personalized treatment in cancer patients. J Biomed Inform 2019; 100S:100001. [PMID: 34384574 DOI: 10.1016/j.yjbinx.2018.100001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 11/03/2018] [Accepted: 12/06/2018] [Indexed: 10/27/2022]
Abstract
Standard methods for detecting cancer-associated genes rely on comparison of sample means between cancer patients and healthy controls. While such methods have successfully identified several oncogenes and tumor suppressor genes, they neglect to account for heterogeneity within the cancer population. Genetic mutations, translocations, and amplifications are often inconsistent across tumors, and instead they often affect smaller subsets of patients. This concept gives rise to the idea of bimodally expressed genes, or genes that display two modes of expression within one population. Analysis of bimodal gene expression has been explored via a variety of techniques including test statistics and clustering. In this review, we summarize the methodologies used to quantify bimodal gene expression and address the utility of these genes in patient stratification and specialized therapeutics in breast and lung cancer. Finally we discuss the limitations and future directions for bimodal genes in the era of high-throughput sequencing and personalized medicine.
Collapse
Affiliation(s)
- Laura Moody
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States.
| | - Suparna Mantha
- Carle Physician Group, Carle Cancer Center, Carle Foundation Hospital, Urbana, IL 61802, United States.
| | - Hong Chen
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States; Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States.
| | - Yuan-Xiang Pan
- Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States; Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States; Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States.
| |
Collapse
|
15
|
Behrisch M, Schreck T, Krüger R, Gehlenborg N, Lekschas F, Pfister H. Visual Pattern-Driven Exploration of Big Data. 2018 INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL AND IMMERSIVE ANALYTICS (BDVA) : KONSTANZ, GERMANY, OCTOBER 17 -19, 2018. IEEE INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL AND IMMERSIVE ANALYTICS (4TH : 2018 : KONSTANZ, GERMANY) 2018; 2018:10.1109/BDVA.2018.8534028. [PMID: 31396383 PMCID: PMC6687327 DOI: 10.1109/bdva.2018.8534028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and complexity also the number of patterns increases, leaving the analyst with a vast result space. Current algorithmic and especially visualization approaches often fail to answer central overview questions essential for a comprehensive understanding of pattern distributions and support, their quality, and relevance to the analysis task. To address these challenges, we contribute a visual analytics pipeline targeted on the pattern-driven exploration of result spaces in a semi-automatic fashion. Specifically, we combine image feature analysis and unsupervised learning to partition the pattern space into interpretable, coherent chunks, which should be given priority in a subsequent in-depth analysis. In our analysis scenarios, no ground-truth is given. Thus, we employ and evaluate novel quality metrics derived from the distance distributions of our image feature vectors and the derived cluster model to guide the feature selection process. We visualize our results interactively, allowing the user to drill down from overview to detail into the pattern space and demonstrate our techniques in two case studies on Earth observation and biomedical genomic data.
Collapse
|
16
|
|
17
|
Li K, Zeng L, Wei H, Hu J, Jiao L, Zhang J, Xiong Y. Identification of gene-specific DNA methylation signature for Colorectal Cancer. Cancer Genet 2018; 228-229:5-11. [PMID: 30553473 DOI: 10.1016/j.cancergen.2018.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 04/22/2018] [Accepted: 05/08/2018] [Indexed: 12/28/2022]
Abstract
BACKGROUND Colorectal Cancer (CC), a common disease causing approximately million deaths annually, has been the third most frequent type of malignancy. We aimed to identify gene-specific DNA methylation signature to function as prognostic and predictive markers for CC patient survival. METHODS Expression profiles of gene-specific DNA methylation and the corresponding clinical information of 201 CC patients were downloaded from The Cancer Genome Atlas (TCGA) dataset and differentially expressed gene-specific DNA methylation was identified after tumor subtype classification. A risk score model was further built by analyzing the expression data of these gene-specific DNA methylations from the training dataset of CC patients. RESULTS Totally, 214 gene-specific DNA methylations were found to be expressed significantly between different subtypes of CC, including 150 up-regulated and 64 down-regulated ones. Up-regulated gene-specific DNA methylation accounted for 70.1% and the down-regulated gene-specific DNA methylation accounted for 29.9%. Hereinto, six gene-specific DNA methylations were obtained, including methy_vimentin and methy_ TFPI2, which were found significantly correlated with overall survival status of patients with CC. CONCLUSIONS With the six gene-specific DNA methylation signatures, patients in the training set were divided into low-risk and high- risk groups. What's more, gene-specific DNA methylation target genes were highly associated with protein phosphorylation, which indicated that further research on phosphorylation of target gene-coding protein might provide new sight on the treatment of CC.
Collapse
Affiliation(s)
- Kaixue Li
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Li Zeng
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Hong Wei
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Jingjing Hu
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Lu Jiao
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Juan Zhang
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Ying Xiong
- Department of Gastroenterology, The Second People's Hospital of Shenzhen, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China.
| |
Collapse
|
18
|
Leale G, Baya AE, Milone DH, Granitto PM, Stegmayer G. Inferring Unknown Biological Function by Integration of GO Annotations and Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:168-180. [PMID: 27723603 DOI: 10.1109/tcbb.2016.2615960] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.
Collapse
|
19
|
Gruben BS, Mäkelä MR, Kowalczyk JE, Zhou M, Benoit-Gelber I, De Vries RP. Expression-based clustering of CAZyme-encoding genes of Aspergillus niger. BMC Genomics 2017; 18:900. [PMID: 29169319 PMCID: PMC5701360 DOI: 10.1186/s12864-017-4164-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 10/05/2017] [Indexed: 11/29/2022] Open
Abstract
Background The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. Results The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Conclusions Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In addition, the data provides additional evidence in favor of and against the similarity-based functions assigned to uncharacterized genes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4164-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Birgit S Gruben
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Microbiology, Utrecht University, Padualaan 8, 3584, CH, Utrecht, The Netherlands
| | - Miia R Mäkelä
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Department of Food and Environmental Sciences, Division of Microbiology and Biotechnology, Viikki Biocenter 1, University of Helsinki, Helsinki, Finland
| | - Joanna E Kowalczyk
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands
| | - Miaomiao Zhou
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Current affiliation: ATGM, Avans University of Applied Sciences, Lovensdijkstraat 61-63, 4818, AJ, Breda, The Netherlands
| | - Isabelle Benoit-Gelber
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Microbiology, Utrecht University, Padualaan 8, 3584, CH, Utrecht, The Netherlands.,Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.,Current affiliation: Center for Structural and Functional Genomics, Concordia University, 7141 Sherbrooke St. W, Montreal, QC, Canada
| | - Ronald P De Vries
- Fungal Physiology, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands. .,Microbiology, Utrecht University, Padualaan 8, 3584, CH, Utrecht, The Netherlands. .,Fungal Molecular Physiology, Utrecht University, Uppsalalaan 8, 3584, CT, Utrecht, The Netherlands.
| |
Collapse
|
20
|
Paul AK, Shill PC. Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data. Biosystems 2017; 163:1-10. [PMID: 29113811 DOI: 10.1016/j.biosystems.2017.09.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 09/26/2017] [Accepted: 09/27/2017] [Indexed: 12/28/2022]
Abstract
The product of gene expression works together in the cell for each living organism in order to achieve different biological processes. Many proteins are involved in different roles depending on the environment of the organism for the functioning of the cell. In this paper, we propose gene ontology (GO) annotations based semi-supervised clustering algorithm called GO fuzzy relational clustering (GO-FRC) where one gene is allowed to be assigned to multiple clusters which are the most biologically relevant behavior of genes. In the clustering process, GO-FRC utilizes useful biological knowledge which is available in the form of a gene ontology, as a prior knowledge along with the gene expression data. The prior knowledge helps to improve the coherence of the groups concerning the knowledge field. The proposed GO-FRC has been tested on the two yeast (Saccharomyces cerevisiae) expression profiles datasets (Eisen and Dream5 yeast datasets) and compared with other state-of-the-art clustering algorithms. Experimental results imply that GO-FRC is able to produce more biologically relevant clusters with the use of the small amount of GO annotations.
Collapse
Affiliation(s)
- Animesh Kumar Paul
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh.
| | - Pintu Chandra Shill
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh
| |
Collapse
|
21
|
Data-analysis strategies for image-based cell profiling. Nat Methods 2017; 14:849-863. [PMID: 28858338 PMCID: PMC6871000 DOI: 10.1038/nmeth.4397] [Citation(s) in RCA: 404] [Impact Index Per Article: 57.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/28/2017] [Indexed: 12/16/2022]
Abstract
Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.
Collapse
|
22
|
Jaeger D, Winkler A, Mussgnug JH, Kalinowski J, Goesmann A, Kruse O. Time-resolved transcriptome analysis and lipid pathway reconstruction of the oleaginous green microalga Monoraphidium neglectum reveal a model for triacylglycerol and lipid hyperaccumulation. BIOTECHNOLOGY FOR BIOFUELS 2017; 10:197. [PMID: 28814974 PMCID: PMC5556983 DOI: 10.1186/s13068-017-0882-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 08/03/2017] [Indexed: 05/03/2023]
Abstract
BACKGROUND Oleaginous microalgae are promising production hosts for the sustainable generation of lipid-based bioproducts and as bioenergy carriers such as biodiesel. Transcriptomics of the lipid accumulation phase, triggered efficiently by nitrogen starvation, is a valuable approach for the identification of gene targets for metabolic engineering. RESULTS An explorative analysis of the detailed transcriptional response to different stages of nitrogen availability was performed in the oleaginous green alga Monoraphidium neglectum. Transcript data were correlated with metabolic data for cellular contents of starch and of different lipid fractions. A pronounced transcriptional down-regulation of photosynthesis became apparent in response to nitrogen starvation, whereas glucose catabolism was found to be up-regulated. An in-depth reconstruction and analysis of the pathways for glycerolipid, central carbon, and starch metabolism revealed that distinct transcriptional changes were generally found only for specific steps within a metabolic pathway. In addition to pathway analyses, the transcript data were also used to refine the current genome annotation. The transcriptome data were integrated into a database and complemented with data for other microalgae which were also subjected to nitrogen starvation. It is available at https://tdbmn.cebitec.uni-bielefeld.de. CONCLUSIONS Based on the transcriptional responses to different stages of nitrogen availability, a model for triacylglycerol and lipid hyperaccumulation is proposed, which involves transcriptional induction of thioesterases, differential regulation of lipases, and a re-routing of the central carbon metabolism. Over-expression of distinct thioesterases was identified to be a potential strategy to increase the oleaginous phenotype of M. neglectum, and furthermore specific lipases were identified as potential targets for future metabolic engineering approaches.
Collapse
Affiliation(s)
- Daniel Jaeger
- Algae Biotechnology and Bioenergy, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany
| | - Anika Winkler
- Microbial Genomics and Biotechnology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany
| | - Jan H. Mussgnug
- Algae Biotechnology and Bioenergy, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany
| | - Jörn Kalinowski
- Microbial Genomics and Biotechnology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus-Liebig-Universität, 35392 Gießen, Germany
| | - Olaf Kruse
- Algae Biotechnology and Bioenergy, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany
- Algae Biotechnology and Bioenergy, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Universitaetsstrasse 27, 33615 Bielefeld, Germany
| |
Collapse
|
23
|
Madeira D, Araújo JE, Vitorino R, Capelo JL, Vinagre C, Diniz MS. Ocean warming alters cellular metabolism and induces mortality in fish early life stages: A proteomic approach. ENVIRONMENTAL RESEARCH 2016; 148:164-176. [PMID: 27062348 DOI: 10.1016/j.envres.2016.03.030] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Revised: 03/21/2016] [Accepted: 03/22/2016] [Indexed: 06/05/2023]
Abstract
Climate change has pervasive effects on marine ecosystems, altering biodiversity patterns, abundance and distribution of species, biological interactions, phenology, and organisms' physiology, performance and fitness. Fish early life stages have narrow thermal windows and are thus more vulnerable to further changes in water temperature. The aim of this study was to address the sensitivity and underlying molecular changes of larvae of a key fisheries species, the sea bream Sparus aurata, towards ocean warming. Larvae were exposed to three temperatures: 18°C (control), 24°C (warm) and 30°C (heat wave) for seven days. At the end of the assay, i) survival curves were plotted for each temperature treatment and ii) entire larvae were collected for proteomic analysis via 2D gel electrophoresis, image analysis and mass spectrometry. Survival decreased with increasing temperature, with no larvae surviving at 30°C. Therefore, proteomic analysis was only carried out for 18°C and 24°C. Larvae up-regulated protein folding and degradation, cytoskeletal re-organization, transcriptional regulation and the growth hormone while mostly down-regulating cargo transporting and porphyrin metabolism upon exposure to heat stress. No changes were detected in proteins related to energetic metabolism suggesting that larval fish may not have the energetic plasticity needed to sustain cellular protection in the long-term. These results indicate that despite proteome modulation, S. aurata larvae do not seem able to fully acclimate to higher temperatures as shown by the low survival rates. Consequently, elevated temperatures seem to have bottleneck effects during fish early life stages, and future ocean warming can potentially compromise recruitment's success of key fisheries species.
Collapse
Affiliation(s)
- D Madeira
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal.
| | - J E Araújo
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - R Vitorino
- Department of Medical Sciences, Institute of Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal; Department of Physiology and Cardiothoracic Surgery, Faculty of Medicine, University of Porto, Porto, Portugal
| | - J L Capelo
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - C Vinagre
- MARE - Marine and Environmental Sciences Centre, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
| | - M S Diniz
- UCIBIO, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal.
| |
Collapse
|
24
|
Buschmann D, Haberberger A, Kirchner B, Spornraft M, Riedmaier I, Schelling G, Pfaffl MW. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow. Nucleic Acids Res 2016; 44:5995-6018. [PMID: 27317696 PMCID: PMC5291277 DOI: 10.1093/nar/gkw545] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/03/2016] [Indexed: 12/21/2022] Open
Abstract
Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions.
Collapse
Affiliation(s)
- Dominik Buschmann
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Goethestraße 29, 80336 München, Germany
| | - Anna Haberberger
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Benedikt Kirchner
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Melanie Spornraft
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Irmgard Riedmaier
- Eurofins Medigenomix Forensik GmbH, Anzinger Straße 7a, 85560 Ebersberg, Germany Department of Anesthesiology, University Hospital, Ludwig-Maximilians-University Munich, Marchioninistraße 15, 81377 München, Germany
| | - Gustav Schelling
- Department of Physiology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Michael W Pfaffl
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
25
|
Lawlor N, Fabbri A, Guan P, George J, Karuturi RKM. multiClust: An R-package for Identifying Biologically Relevant Clusters in Cancer Transcriptome Profiles. Cancer Inform 2016; 15:103-14. [PMID: 27330269 PMCID: PMC4907340 DOI: 10.4137/cin.s38000] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 03/28/2016] [Accepted: 04/03/2016] [Indexed: 12/26/2022] Open
Abstract
Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.
Collapse
Affiliation(s)
- Nathan Lawlor
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Alec Fabbri
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA
| | - Peiyong Guan
- Genome Institute of Singapore, A*STAR (Agency for Science, Technology and Research), Singapore
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | |
Collapse
|
26
|
Ning P, Wang J, Zhou Y, Gao L, Wang J, Gong C. Adaptional evolution of trichome in Caragana korshinskii to natural drought stress on the Loess Plateau, China. Ecol Evol 2016; 6:3786-3795. [PMID: 28725356 PMCID: PMC5513310 DOI: 10.1002/ece3.2157] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 03/25/2016] [Accepted: 03/27/2016] [Indexed: 01/04/2023] Open
Abstract
Caragana korshinskii is commonly employed to improve drought ecosystems on the Loess Plateau, although the molecular mechanism at work is poorly understood, particularly in terms of the plant's ability to tolerate drought stress. Water is the most severe limiting factor for plant growth on the Loess Plateau. The trichome is known to play an efficient role in reducing water loss through decreasing the rate of transpiration, so in this study, we focused on the trichome‐related gene expression of ecological adaptation in C. korshinskii under low precipitation conditions. In order to explore the responses of trichomes to drought, we selected two experimental sites from wet to dry along the Loess Plateau latitude gradient for observation. Micro‐phenomena through which trichomes grew denser and larger under reduced precipitation were observed using a scanning electron microscope; de novo transcriptomes and quantitative PCR were then used to explore and verify gene expression patterns of C. korshinskii trichomes. Results showed that GIS2,TTG1, and GL2 were upregulated (as key positive‐regulated genes on trichome development), while CPC was downregulated (negative‐regulated gene). Taken together, our data indicate that downstream genes of gibberellin and cytokinin signaling pathways, alongside several cytoskeleton‐related genes, contribute to modulating trichome development to enhance transpiration resistance ability and increase the resistance to drought stress in C. korshinskii.
Collapse
Affiliation(s)
- Pengbo Ning
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China.,School of Life Science and Technology Xidian University Xi'an Shaanxi 710071 China
| | - Junhui Wang
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China
| | - Yulu Zhou
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China
| | - Lifang Gao
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China
| | - Jun Wang
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China
| | - Chunmei Gong
- College of Life Science Northwest A&F University Yangling Shaanxi 712100 China
| |
Collapse
|
27
|
Sanchita, Sharma A. Computational gene expression profiling under salt stress reveals patterns of co-expression. GENOMICS DATA 2016; 7:214-21. [PMID: 26981411 PMCID: PMC4778677 DOI: 10.1016/j.gdata.2016.01.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 01/11/2016] [Accepted: 01/14/2016] [Indexed: 10/28/2022]
Abstract
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes.
Collapse
Affiliation(s)
- Sanchita
- Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, Post Office CIMAP, Lucknow 226015, India
| | - Ashok Sharma
- Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, Post Office CIMAP, Lucknow 226015, India
| |
Collapse
|
28
|
Ghosh A, De RK. Identification of certain cancer-mediating genes using Gaussian fuzzy cluster validity index. J Biosci 2015; 40:741-54. [PMID: 26564976 DOI: 10.1007/s12038-015-9557-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this article, we have used an index, called Gaussian fuzzy index (GFI), recently developed by the authors, based on the notion of fuzzy set theory, for validating the clusters obtained by a clustering algorithm applied on cancer gene expression data. GFI is then used for the identification of genes that have altered quite significantly from normal state to carcinogenic state with respect to their mRNA expression patterns. The effectiveness of the methodology has been demonstrated on three gene expression cancer datasets dealing with human lung, colon and leukemia. The performance of GFI is compared with 19 exiting cluster validity indices. The results are appropriately validated biologically and statistically. In this context, we have used biochemical pathways, p-value statistics of GO attributes, t-test and zscore for the validation of the results. It has been reported that GFI is capable of identifying high-quality enriched clusters of genes, and thereby is able to select more cancer-mediating genes.
Collapse
Affiliation(s)
- Anupam Ghosh
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India,
| | | |
Collapse
|
29
|
Frost HR, Li Z, Moore JH. Spectral gene set enrichment (SGSE). BMC Bioinformatics 2015; 16:70. [PMID: 25879888 PMCID: PMC4365810 DOI: 10.1186/s12859-015-0490-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 02/04/2015] [Indexed: 01/29/2023] Open
Abstract
Background Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Conclusions Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0490-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- H Robert Frost
- Institute of Quantitative Biomedical Sciences, Geisel School of Medicine, Lebanon, NH, 03756, USA. .,Section of Biostatistics and Epidemiology, Department of Community and Family Medicine, Geisel School of Medicine, Lebanon, NH, 03756, USA. .,Department of Genetics, Dartmouth College, Hanover, NH, 03755, USA.
| | - Zhigang Li
- Institute of Quantitative Biomedical Sciences, Geisel School of Medicine, Lebanon, NH, 03756, USA. .,Section of Biostatistics and Epidemiology, Department of Community and Family Medicine, Geisel School of Medicine, Lebanon, NH, 03756, USA.
| | - Jason H Moore
- Institute of Quantitative Biomedical Sciences, Geisel School of Medicine, Lebanon, NH, 03756, USA. .,Section of Biostatistics and Epidemiology, Department of Community and Family Medicine, Geisel School of Medicine, Lebanon, NH, 03756, USA. .,Department of Genetics, Dartmouth College, Hanover, NH, 03755, USA.
| |
Collapse
|
30
|
Espitia CM, Saldarriaga OA, Travi BL, Osorio EY, Hernandez A, Band M, Patel MJ, Medina AA, Cappello M, Pekosz A, Melby PC. Transcriptional profiling of the spleen in progressive visceral leishmaniasis reveals mixed expression of type 1 and type 2 cytokine-responsive genes. BMC Immunol 2014; 15:38. [PMID: 25424735 PMCID: PMC4253007 DOI: 10.1186/s12865-014-0038-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 09/15/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The Syrian golden hamster (Mesocricetus aureus) has been used as a model to study infections caused by a number of human pathogens. Studies of immunopathogenesis in hamster infection models are challenging because of the limited availability of reagents needed to define cellular and molecular determinants. RESULTS We sequenced a hamster cDNA library and developed a first-generation custom cDNA microarray that included 5131 unique cDNAs enriched for immune response genes. We used this microarray to interrogate the hamster spleen response to Leishmania donovani, an intracellular protozoan that causes visceral leishmaniasis. The hamster model of visceral leishmaniasis is of particular interest because it recapitulates clinical and immunopathological features of human disease, including cachexia, massive splenomegaly, pancytopenia, immunosuppression, and ultimately death. In the microarray a differentially expressed transcript was identified as having at least a 2-fold change in expression between uninfected and infected groups and a False Discovery Rate of <5%. Following a relatively silent early phase of infection (at 7 and 14 days post-infection only 8 and 24 genes, respectively, were differentially expressed), there was dramatic upregulation of inflammatory and immune-related genes in the spleen (708 differentially expressed genes were evident at 28 days post-infection). The differentially expressed transcripts included genes involved in inflammation, immunity, and immune cell trafficking. Of particular interest there was concomitant upregulation of the IFN-γ and interleukin (IL)-4 signaling pathways, with increased expression of a battery of IFN-γ- and IL-4-responsive genes. The latter included genes characteristic of alternatively activated macrophages. CONCLUSIONS Transcriptional profiling was accomplished in the Syrian golden hamster, for which a fully annotated genome is not available. In the hamster model of visceral leishmaniasis, a robust and functional IFN-γ response did not restrain parasite load and progression of disease. This supports the accumulating evidence that macrophages are ineffectively activated to kill the parasite. The concomitant expression of IL-4/IL-13 and their downstream target genes, some of which were characteristic of alternative macrophage activation, are likely to contribute to this. Further dissection of mechanisms that lead to polarization of macrophages toward a permissive state is needed to fully understand the pathogenesis of visceral leishmaniasis.
Collapse
|
31
|
Ahmed HA, Mahanta P, Bhattacharyya DK, Kalita JK. Shifting-and-Scaling Correlation Based Biclustering Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1239-1252. [PMID: 26357059 DOI: 10.1109/tcbb.2014.2323054] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The existence of various types of correlations among the expressions of a group of biologically significant genes poses challenges in developing effective methods of gene expression data analysis. The initial focus of computational biologists was to work with only absolute and shifting correlations. However, researchers have found that the ability to handle shifting-and-scaling correlation enables them to extract more biologically relevant and interesting patterns from gene microarray data. In this paper, we introduce an effective shifting-and-scaling correlation measure named Shifting and Scaling Similarity (SSSim), which can detect highly correlated gene pairs in any gene expression data. We also introduce a technique named Intensive Correlation Search (ICS) biclustering algorithm, which uses SSSim to extract biologically significant biclusters from a gene expression data set. The technique performs satisfactorily with a number of benchmarked gene expression data sets when evaluated in terms of functional categories in Gene Ontology database.
Collapse
|
32
|
Fa R, Nandi AK. Noise Resistant Generalized Parametric Validity Index of Clustering for Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:741-752. [PMID: 26356344 DOI: 10.1109/tcbb.2014.2312006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements.
Collapse
|
33
|
Ichikawa K, Morishita S. A Simple but Powerful Heuristic Method for Accelerating k-Means Clustering of Large-Scale Data in Life Science. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:681-692. [PMID: 26356339 DOI: 10.1109/tcbb.2014.2306200] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
K-means clustering has been widely used to gain insight into biological systems from large-scale life science data. To quantify the similarities among biological data sets, Pearson correlation distance and standardized Euclidean distance are used most frequently; however, optimization methods have been largely unexplored. These two distance measurements are equivalent in the sense that they yield the same k-means clustering result for identical sets of k initial centroids. Thus, an efficient algorithm used for one is applicable to the other. Several optimization methods are available for the Euclidean distance and can be used for processing the standardized Euclidean distance; however, they are not customized for this context. We instead approached the problem by studying the properties of the Pearson correlation distance, and we invented a simple but powerful heuristic method for markedly pruning unnecessary computation while retaining the final solution. Tests using real biological data sets with 50-60K vectors of dimensions 10-2001 (~400 MB in size) demonstrated marked reduction in computation time for k = 10-500 in comparison with other state-of-the-art pruning methods such as Elkan's and Hamerly's algorithms. The BoostKCP software is available at http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/boostKCP/.
Collapse
|
34
|
Villiers F, Bastien O, Kwak JM. R. S. WebTool, a web server for random sampling-based significance evaluation of pairwise distances. Nucleic Acids Res 2014; 42:W198-204. [PMID: 24878919 PMCID: PMC4086074 DOI: 10.1093/nar/gku427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Pairwise comparison of data vectors represents a large part of computational biology, especially with the continuous increase in genome-wide approaches yielding more information from more biological samples simultaneously. Gene clustering for function prediction as well as analyses of signalling pathways and the time-dependent dynamics of a system are common biological approaches that often rely on large dataset comparison. Different metrics can be used to evaluate the similarity between entities to be compared, such as correlation coefficients and distances. While the latter offers a more flexible way of measuring potential biological relationships between datasets, the significance of any given distance is highly dependent on the dataset and cannot be easily determined. Monte Carlo methods are robust approaches for evaluating the significance of distance values by multiple random permutations of the dataset followed by distance calculation. We have developed R. S. WebTool (http://rswebtool.kwaklab.org), a user-friendly online server for random sampling-based evaluation of distance significances that features an array of visualization and analysis tools to help non-bioinformaticist users extract significant relationships from random noise in distance-based dataset analyses.
Collapse
Affiliation(s)
- Florent Villiers
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20740, USA
| | - Olivier Bastien
- Laboratoire de Physiologie Cellulaire et Végétale, iRTSV, CEA-Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France
| | - June M Kwak
- Center for Plant Aging Research, Institute for Basic Science, Department of New Biology, DGIST, Daegu 711-873, Republic of Korea
| |
Collapse
|
35
|
Song X, Li L, Srimani PK, Yu PS, Wang JZ. Measure the Semantic Similarity of GO Terms Using Aggregate Information Content. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:468-476. [PMID: 26356015 DOI: 10.1109/tcbb.2013.176] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The rapid development of gene ontology (GO) and huge amount of biomedical data annotated by GO terms necessitate computation of semantic similarity of GO terms and, in turn, measurement of functional similarity of genes based on their annotations. In this paper we propose a novel and efficient method to measure the semantic similarity of GO terms. The proposed method addresses the limitations in existing GO term similarity measurement techniques; it computes the semantic content of a GO term by considering the information content of all of its ancestor terms in the graph. The aggregate information content (AIC) of all ancestor terms of a GO term implicitly reflects the GO term's location in the GO graph and also represents how human beings use this GO term and all its ancestor terms to annotate genes. We show that semantic similarity of GO terms obtained by our method closely matches the human perception. Extensive experimental studies show that this novel method also outperforms all existing methods in terms of the correlation with gene expression data. We have developed web services for measuring semantic similarity of GO terms and functional similarity of genes using the proposed AIC method and other popular methods. These web services are available at http://bioinformatics.clemson.edu/G-SESAME.
Collapse
|
36
|
Discovering up-regulated VEGF-C expression in swine umbilical vein endothelial cells by classical swine fever virus Shimen. Vet Res 2014; 45:48. [PMID: 24758593 PMCID: PMC4018968 DOI: 10.1186/1297-9716-45-48] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 04/01/2014] [Indexed: 11/10/2022] Open
Abstract
Infection of domestic swine with the highly virulent Shimen strain of classical swine fever virus causes hemorrhagic lymphadenitis and diffuse hemorrhaging in infected swine. We analyzed patterns of gene expression for CSFV Shimen in swine umbilical vein endothelial cells (SUVECs). Transcription of the vascular endothelial growth factor (VEGF) C gene (VEGF-C) and translation of the corresponding protein were significantly up-regulated in SUVECs. Our findings suggest that VEGF-C is involved in mechanisms of acute infection caused by virulent strains of CSFV.
Collapse
|
37
|
|
38
|
Zhao X, Zhong S, Zuo X, Lin M, Qin J, Luan Y, Zhang N, Liang Y, Rao S. Pathway-based analysis of the hidden genetic heterogeneities in cancers. GENOMICS, PROTEOMICS & BIOINFORMATICS 2014; 12:31-8. [PMID: 24462714 PMCID: PMC4411334 DOI: 10.1016/j.gpb.2013.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Revised: 12/06/2013] [Accepted: 12/09/2013] [Indexed: 12/18/2022]
Abstract
Many cancers apparently showing similar phenotypes are actually distinct at the molecular level, leading to very different responses to the same treatment. It has been recently demonstrated that pathway-based approaches are robust and reliable for genetic analysis of cancers. Nevertheless, it remains unclear whether such function-based approaches are useful in deciphering molecular heterogeneities in cancers. Therefore, we aimed to test this possibility in the present study. First, we used a NCI60 dataset to validate the ability of pathways to correctly partition samples. Next, we applied the proposed method to identify the hidden subtypes in diffuse large B-cell lymphoma (DLBCL). Finally, the clinical significance of the identified subtypes was verified using survival analysis. For the NCI60 dataset, we achieved highly accurate partitions that best fit the clinical cancer phenotypes. Subsequently, for a DLBCL dataset, we identified three hidden subtypes that showed very different 10-year overall survival rates (90%, 46% and 20%) and were highly significantly (P=0.008) correlated with the clinical survival rate. This study demonstrated that the pathway-based approach is promising for unveiling genetic heterogeneities in complex human diseases.
Collapse
Affiliation(s)
- Xiaolei Zhao
- Institute for Medical Systems Biology and Department of Medical Statistics and Epidemiology, School of Public Health, Guangdong Medical College, Dongguan 523808, China
| | | | - Xiaoyu Zuo
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
| | - Meihua Lin
- Institute for Medical Systems Biology and Department of Medical Statistics and Epidemiology, School of Public Health, Guangdong Medical College, Dongguan 523808, China
| | - Jiheng Qin
- Institute for Medical Systems Biology and Department of Medical Statistics and Epidemiology, School of Public Health, Guangdong Medical College, Dongguan 523808, China
| | - Yizhao Luan
- Institute for Medical Systems Biology and Department of Medical Statistics and Epidemiology, School of Public Health, Guangdong Medical College, Dongguan 523808, China
| | - Naizun Zhang
- Maoming People's Hospital, Maoming 525000, China
| | - Yan Liang
- Maoming People's Hospital, Maoming 525000, China.
| | - Shaoqi Rao
- Institute for Medical Systems Biology and Department of Medical Statistics and Epidemiology, School of Public Health, Guangdong Medical College, Dongguan 523808, China; Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China.
| |
Collapse
|
39
|
Jaskowiak PA, Campello RJGB, Costa IG. On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics 2014; 15 Suppl 2:S2. [PMID: 24564555 PMCID: PMC4072854 DOI: 10.1186/1471-2105-15-s2-s2] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clustering is crucial for gene expression data analysis. As an unsupervised exploratory procedure its results can help researchers to gain insights and formulate new hypothesis about biological data from microarrays. Given different settings of microarray experiments, clustering proves itself as a versatile exploratory tool. It can help to unveil new cancer subtypes or to identify groups of genes that respond similarly to a specific experimental condition. In order to obtain useful clustering results, however, different parameters of the clustering procedure must be properly tuned. Besides the selection of the clustering method itself, determining which distance is going to be employed between data objects is probably one of the most difficult decisions. RESULTS AND CONCLUSIONS We analyze how different distances and clustering methods interact regarding their ability to cluster gene expression, i.e., microarray data. We study 15 distances along with four common clustering methods from the literature on a total of 52 gene expression microarray datasets. Distances are evaluated on a number of different scenarios including clustering of cancer tissues and genes from short time-series expression data, the two main clustering applications in gene expression. Our results support that the selection of an appropriate distance depends on the scenario in hand. Moreover, in each scenario, given the very same clustering method, significant differences in quality may arise from the selection of distinct distance measures. In fact, the selection of an appropriate distance measure can make the difference between meaningful and poor clustering outcomes, even for a suitable clustering method.
Collapse
Affiliation(s)
- Pablo A Jaskowiak
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos - SP, Brazil
| | - Ricardo JGB Campello
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos - SP, Brazil
| | - Ivan G Costa
- Center of Informatics, Federal University of Pernambuco, Recife - PE, Brazil
- IZKF Computational Biology Research Group, Institute for Biomedical Engineering, RWTH Aachen University Medical School, Aachen, Germany
| |
Collapse
|
40
|
Pooladi M, Rezaei-Tavirani M, Hashemi M, Hesami-Tackallou S, Khaghani-Razi-Abad S, Moradi A, Zali AR, Mousavi M, Firozi-Dalvand L, Rakhshan A, Zamanian Azodi M. Cluster and Principal Component Analysis of Human Glioblastoma Multiforme (GBM) Tumor Proteome. IRANIAN JOURNAL OF CANCER PREVENTION 2014; 7:87-95. [PMID: 25250155 PMCID: PMC4142943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 04/26/2014] [Indexed: 11/21/2022]
Abstract
BACKGROUND Glioblastoma Multiforme (GBM) or grade IV astrocytoma is the most common and lethal adult malignant brain tumor. Several of the molecular alterations detected in gliomas may have diagnostic and/or prognostic implications. Proteomics has been widely applied in various areas of science, ranging from the deciphering of molecular pathogen nests of discuses. METHODS In this study proteins were extracted from the tumor and normal brain tissues and then the protein purity was evaluated by Bradford test and spectrophotometry. In this study, proteins were separated by 2-Dimensional Gel (2DG) electrophoresis method and the spots were then analyzed and compared using statistical data and specific software. Protein clustering analysis was performed on the list of proteins deemed significantly altered in glioblastoma tumors (t-test and one-way ANOVA; P< 0.05). RESULTS The 2D gel showed totally 876 spots. We reported, 172 spots were exhibited differently in expression level (fold > 2) for glioblastoma. On each analytical 2D gel, an average of 876 spots was observed. In this study, 188 spots exhibited up regulation of expression level, whereas the remaining 232 spots were decreased in glioblastoma tumor relative to normal tissue. Results demonstrate that functional clustering (up and down regulated) and Principal Component Analysis (PCA) has considerable merits in aiding the interpretation of proteomic data. CONCLUSION 2D gel electrophoresis is the core of proteomics which permitted the separation of thousands of proteins. High resolution 2DE can resolve up to 5,000 proteins simultaneously. Using cluster analysis, we can also form groups of related variables, similar to what is practiced in factor analysis.
Collapse
Affiliation(s)
- Mehdi Pooladi
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Dept. of Biology, School of Basic Sciences, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehrdad Hashemi
- Dept. Of Molecular Genetics, Tehran Medical Branch, Islamic Azad University Tehran, Iran
| | | | - Solmaz Khaghani-Razi-Abad
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Dept. of Biology, School of Basic Sciences, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Afshin Moradi
- Dept. Of Pathology, Shohada Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Reza Zali
- Dept. of Neurosurgery, Shohada Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Masoumeh Mousavi
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Leila Firozi-Dalvand
- Dept. of Biology, School of Basic Sciences, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Azadeh Rakhshan
- Dept. Of Pathology, Shohada Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mona Zamanian Azodi
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
41
|
Bodaker M, Meshorer E, Mitrani E, Louzoun Y. Genes related to differentiation are correlated with the gene regulatory network structure. Bioinformatics 2013; 30:406-13. [DOI: 10.1093/bioinformatics/btt685] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
42
|
Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data. Comput Biol Med 2013; 43:1120-33. [DOI: 10.1016/j.compbiomed.2013.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Revised: 05/14/2013] [Accepted: 05/15/2013] [Indexed: 02/02/2023]
|
43
|
Bhattacharyya M, Das M, Bandyopadhyay S. A New Approach for Combining Knowledge From Multiple Coexpression Networks of MicroRNAs. IEEE Trans Biomed Eng 2013; 60:2167-73. [DOI: 10.1109/tbme.2013.2250285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
44
|
Dutkowski J, Kramer M, Surma MA, Balakrishnan R, Cherry JM, Krogan NJ, Ideker T. A gene ontology inferred from molecular networks. Nat Biotechnol 2013; 31:38-45. [PMID: 23242164 DOI: 10.1038/nbt.2463] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Indexed: 12/20/2022]
Abstract
Ontologies have proven very useful for capturing knowledge as a hierarchy of terms and their interrelationships. In biology a major challenge has been to construct ontologies of gene function given incomplete biological knowledge and inconsistencies in how this knowledge is manually curated. Here we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO). The network-extracted ontology (NeXO) contains 4,123 biological terms and 5,766 term-term relations, capturing 58% of known cellular components. We also explore robust NeXO terms and term relations that were initially not cataloged in GO, a number of which have now been added based on our analysis. Using quantitative genetic interaction profiling and chemogenomics, we find further support for many of the uncharacterized terms identified by NeXO, including multisubunit structures related to protein trafficking or mitochondrial function. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.
Collapse
Affiliation(s)
- Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla, California, USA.
| | | | | | | | | | | | | |
Collapse
|
45
|
Kessler T, Hache H, Wierling C. Integrative analysis of cancer-related signaling pathways. Front Physiol 2013; 4:124. [PMID: 23760067 PMCID: PMC3671203 DOI: 10.3389/fphys.2013.00124] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 05/12/2013] [Indexed: 12/11/2022] Open
Abstract
Identification and classification of cancer types and subtypes is a major issue in current cancer research. Whole genome expression profiling of cancer tissues is often the basis for such subtype classifications of tumors and different signatures for individual cancer types have been described. However, the search for best performing discriminatory gene-expression signatures covering more than one cancer type remains a relevant topic in cancer research as such a signature would help understanding the common changes in signaling networks in these disease types. In this work, we explore the idea of a top down approach for sample stratification based on a module-based network of cancer relevant signaling pathways. For assembly of this network, we consider several of the most established cancer pathways. We evaluate our sample stratification approach using expression data of human breast and ovarian cancer signatures. We show that our approach performs equally well to previously reported methods besides providing the advantage to classify different cancer types. Furthermore, it allows to identify common changes in network module activity of those cancer samples.
Collapse
Affiliation(s)
- Thomas Kessler
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Hendrik Hache
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Christoph Wierling
- Systems Biology Group, Department Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
46
|
Bandyopadhyay S, Maulik U, Chakraborty R. Incorporating ϵ-dominance in AMOSA: Application to multiobjective 0/1 knapsack problem and clustering gene expression data. Appl Soft Comput 2013. [DOI: 10.1016/j.asoc.2012.11.050] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
47
|
Ghosh A, Dhara BC, De RK. Comparative Analysis of Cluster Validity Indices in Identifying Some Possible Genes Mediating Certain Cancers. Mol Inform 2013; 32:347-54. [PMID: 27481591 DOI: 10.1002/minf.201200142] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 02/25/2012] [Indexed: 11/10/2022]
Abstract
In this article, we compare the performance of 19 cluster validity indices, in identifying some possible genes mediating certain cancers, based on gene expression data. For the purpose of this comparison, we have developed a method. The proposed method involves cluster generation, selection of the best k-value or c-values, cluster identification, identifying the altered gene cluster, scoring an altered gene cluster and determining the best k-value or c-value exploring through biological repositories. The effectiveness of the method has been demonstrated on three gene expression data sets dealing with human lung cancer, colon cancer, and leukemia. Here, we have used three clustering algorithms, i.e., k-means, PAM and fuzzy c-means. We have used biochemical pathways related to these cancers and p-value statistics for validating the study.
Collapse
Affiliation(s)
- Anupam Ghosh
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | | | - Rajat K De
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.
| |
Collapse
|
48
|
The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn 2013. [DOI: 10.1007/s10994-013-5334-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
49
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery. PLoS One 2013; 8:e56432. [PMID: 23409186 PMCID: PMC3569426 DOI: 10.1371/journal.pone.0056432] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 01/10/2013] [Indexed: 11/19/2022] Open
Abstract
Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
| | - Rui Fa
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
| | - David J. Roberts
- National Health Service Blood and Transplant, Oxford, United Kingdom
- The University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Asoke K. Nandi
- Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom
- Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
- * E-mail:
| |
Collapse
|
50
|
Viljoen KS, Blackburn JM. Quality assessment and data handling methods for Affymetrix Gene 1.0 ST arrays with variable RNA integrity. BMC Genomics 2013; 14:14. [PMID: 23324084 PMCID: PMC3557148 DOI: 10.1186/1471-2164-14-14] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 01/02/2013] [Indexed: 01/15/2023] Open
Abstract
Background RNA and microarray quality assessment form an integral part of gene expression analysis and, although methods such as the RNA integrity number (RIN) algorithm reliably asses RNA integrity, the relevance of RNA integrity in gene expression analysis as well as analysis methods to accommodate the possible effects of degradation requires further investigation. We investigated the relationship between RNA integrity and array quality on the commonly used Affymetrix Gene 1.0 ST array platform using reliable within-array and between-array quality assessment measures. The possibility of a transcript specific bias in the apparent effect of RNA degradation on the measured gene expression signal was evaluated after either excluding quality-flagged arrays or compensation for RNA degradation at different steps in the analysis. Results Using probe-level and inter-array quality metrics to assess 34 Gene 1.0 ST array datasets derived from historical, paired tumour and normal primary colorectal cancer samples, 7 arrays (20.6%), with a mean sample RIN of 3.2 (SD = 0.42), were flagged during array quality assessment while 10 arrays from samples with RINs < 7 passed quality assessment, including one sample with a RIN < 3. We detected a transcript length bias in RNA degradation in only 5.8% of annotated transcript clusters (p-value 0.05, FC ≥ |2|), with longer and shorter than average transcripts under- and overrepresented in quality-flagged samples respectively. Applying compensatory measures for RNA degradation performed at least as well as excluding quality-flagged arrays, as judged by hierarchical clustering, gene expression analysis and Ingenuity Pathway Analysis; importantly, use of these compensatory measures had the significant benefit of enabling lower quality array data from irreplaceable clinical samples to be retained in downstream analyses. Conclusions Here, we demonstrate an effective array-quality assessment strategy, which will allow the user to recognize lower quality arrays that can be included in the analysis once appropriate measures are applied to account for known or unknown sources of variation, such as array quality- and batch- effects, by implementing ComBat or Surrogate Variable Analysis. This approach of quality control and analysis will be especially useful for clinical samples with variable and low RNA qualities, with RIN scores ≥ 2.
Collapse
Affiliation(s)
- Katie S Viljoen
- Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town, 7925, South Africa
| | | |
Collapse
|