1
|
Schreck N, Slynko A, Saadati M, Benner A. Statistical plasmode simulations-Potentials, challenges and recommendations. Stat Med 2024; 43:1804-1825. [PMID: 38356231 DOI: 10.1002/sim.10012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 12/18/2023] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the "truth" known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA data set on breast carcinoma patients.
Collapse
Affiliation(s)
- Nicholas Schreck
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alla Slynko
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Maral Saadati
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
2
|
Transcriptome Analysis of Intracellular Amastigotes of Clinical Leishmania infantum Lines from Therapeutic Failure Patients after Infection of Human Macrophages. Microorganisms 2022; 10:microorganisms10071304. [PMID: 35889023 PMCID: PMC9324091 DOI: 10.3390/microorganisms10071304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/21/2022] [Accepted: 06/23/2022] [Indexed: 11/23/2022] Open
Abstract
Leishmaniasis is considered to be one of the most neglected tropical diseases affecting humans and animals around the world. Due to the absence of an effective vaccine, current treatment is based on chemotherapy. However, the continuous appearance of drug resistance and therapeutic failure (TF) lead to an early obsolescence of treatments. Identification of the factors that contribute to TF and drug resistance in leishmaniasis will constitute a useful tool for establishing future strategies to control this disease. In this manuscript, we evaluated the transcriptomic changes in the intracellular amastigotes of the Leishmania infantum parasites isolated from patients with leishmaniasis and TF at 96 h post-infection of THP-1 cells. The adaptation of the parasites to their new environment leads to expression alterations in the genes involved mainly in the transport through cell membranes, energy and redox metabolism, and detoxification. Specifically, the gene that codes for the prostaglandin f2α synthase seems to be relevant in the pathogenicity and TF since it appears substantially upregulated in all the L. infantum lines. Overall, our results show that at the late infection timepoint, the transcriptome of the parasites undergoes significant changes that probably improve the survival of the Leishmania lines in the host cells, contributing to the TF phenotype as well as drug therapy evasion.
Collapse
|
3
|
Herrera-Campos AB, Zamudio-Martinez E, Delgado-Bellido D, Fernández-Cortés M, Montuenga LM, Oliver FJ, Garcia-Diaz A. Implications of Hyperoxia over the Tumor Microenvironment: An Overview Highlighting the Importance of the Immune System. Cancers (Basel) 2022; 14:2740. [PMID: 35681719 PMCID: PMC9179641 DOI: 10.3390/cancers14112740] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 05/26/2022] [Accepted: 05/30/2022] [Indexed: 02/04/2023] Open
Abstract
Hyperoxia is used in order to counteract hypoxia effects in the TME (tumor microenvironment), which are described to boost the malignant tumor phenotype and poor prognosis. The reduction of tumor hypoxic state through the formation of a non-aberrant vasculature or an increase in the toxicity of the therapeutic agent improves the efficacy of therapies such as chemotherapy. Radiotherapy efficacy has also improved, where apoptotic mechanisms seem to be implicated. Moreover, hyperoxia increases the antitumor immunity through diverse pathways, leading to an immunopermissive TME. Although hyperoxia is an approved treatment for preventing and treating hypoxemia, it has harmful side-effects. Prolonged exposure to high oxygen levels may cause acute lung injury, characterized by an exacerbated immune response, and the destruction of the alveolar-capillary barrier. Furthermore, under this situation, the high concentration of ROS may cause toxicity that will lead not only to cell death but also to an increase in chemoattractant and proinflammatory cytokine secretion. This would end in a lung leukocyte recruitment and, therefore, lung damage. Moreover, unregulated inflammation causes different consequences promoting tumor development and metastasis. This process is known as protumor inflammation, where different cell types and molecules are implicated; for instance, IL-1β has been described as a key cytokine. Although current results show benefits over cancer therapies using hyperoxia, further studies need to be conducted, not only to improve tumor regression, but also to prevent its collateral damage.
Collapse
Affiliation(s)
- Ana Belén Herrera-Campos
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
| | - Esteban Zamudio-Martinez
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
| | - Daniel Delgado-Bellido
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
| | - Mónica Fernández-Cortés
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
| | - Luis M. Montuenga
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
- Program in Solid Tumors, CIMA-University of Navarra, 31008 Pamplona, Spain
- Navarra Health Research Institute (IDISNA), 31008 Pamplona, Spain
| | - F. Javier Oliver
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
| | - Angel Garcia-Diaz
- Instituto de Parasitología y Biomedicina López Neyra, CSIC, 18016 Granada, Spain; (A.B.H.-C.); (E.Z.-M.); (D.D.-B.); (M.F.-C.)
- Consorcio de Investigación Biomédica en Red de Cáncer (CIBERONC), 28029 Madrid, Spain;
| |
Collapse
|
4
|
Perea-Martínez A, García-Hernández R, Manzano JI, Gamarro F. Transcriptomic Analysis in Human Macrophages Infected with Therapeutic Failure Clinical Isolates of Leishmania infantum. ACS Infect Dis 2022; 8:800-810. [PMID: 35352952 PMCID: PMC9003231 DOI: 10.1021/acsinfecdis.1c00513] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Leishmaniasis is one of the neglected tropical diseases with a worldwide distribution, affecting humans and animals. In the absence of an effective vaccine, current treatment is through the use of chemotherapy; however, existing treatments have frequent appearance of drug resistance and therapeutic failure (TF). The identification of factors that contribute to TF in leishmaniasis will provide the basis for a future therapeutic strategy more efficient for the control of this disease. In this article, we have evaluated the transcriptomic changes in the host cells THP-1 after infection with clinical Leishmania infantum isolates from leishmaniasis patients with TF. Our results show that distinct L. infantum isolates differentially modulate host cell response, inducing phenotypic changes that probably may account for parasite survival and TF of patients. Analysis of differential expression genes (DEGs), with a statistical significance threshold of a fold change ≥ 2 and a false discovery rate value ≤ 0.05, revealed a different number of DEGs according to the Leishmanialine. Globally, there was a similar number of genes up- and downregulated in all the infected host THP-1 cells, with exception of Hi-L2221, which showed a higher number of downregulated DEGs. We observed a total of 58 DEGs commonly modulated in all infected host cells, including upregulated (log2FC ≥ 1) and downregulated (log2FC ≤ -1) genes. Based on the results obtained from the analysis of RNA-seq, volcano plot, and GO enrichment analysis, we identified the most significant transcripts of relevance for their possible contribution to the TF observed in patients with leishmaniasis.
Collapse
Affiliation(s)
- Ana Perea-Martínez
- Instituto de Parasitología y Biomedicina “López-Neyra”, IPBLN-CSIC, Parque Tecnológico de Ciencias de la Salud, Avda del Conocimiento 17, 18016 Armilla, Granada, Spain
| | - Raquel García-Hernández
- Instituto de Parasitología y Biomedicina “López-Neyra”, IPBLN-CSIC, Parque Tecnológico de Ciencias de la Salud, Avda del Conocimiento 17, 18016 Armilla, Granada, Spain
| | - José Ignacio Manzano
- Instituto de Parasitología y Biomedicina “López-Neyra”, IPBLN-CSIC, Parque Tecnológico de Ciencias de la Salud, Avda del Conocimiento 17, 18016 Armilla, Granada, Spain
| | - Francisco Gamarro
- Instituto de Parasitología y Biomedicina “López-Neyra”, IPBLN-CSIC, Parque Tecnológico de Ciencias de la Salud, Avda del Conocimiento 17, 18016 Armilla, Granada, Spain
| |
Collapse
|
5
|
García-Hernández R, Manzano JI, Perea-Martínez A, Gamarro F. New Insights on Drug-Resistant Clinical Isolates of Leishmania infantum-Infected Human Macrophages as Determined by Comparative Transcriptome Analyses. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:165-177. [PMID: 35172107 DOI: 10.1089/omi.2021.0185] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Leishmaniasis is the second most important neglected tropical parasitic disease after malaria. This disease is distributed worldwide and can be present in a variety of clinical forms, depending on the parasite species and host's genetic background. As chemotherapy is the only effective weapon whose effectiveness is limited by the frequent appearance of drug resistance and therapeutic failure, new therapeutic strategies are required. To better understand the factors that contribute to therapeutic failure and drug resistance in leishmaniasis, we studied the transcriptomic changes in host THP-1 cells after infection with clinical Leishmania infantum isolates with different susceptibilities to antileishmanial drugs by RNA-seq. Analysis of the differentially expressed genes (DEGs) in infected host cells revealed variations in DEG numbers in the THP-1-infected cells depending on the Leishmania line. A key conclusion of this study is that the modulation of host cells is Leishmania line dependent. Gene ontology enrichment analyses of DEGs indicated that certain biological processes were modulated in the infected host cells, specifically related to cellular metabolism, immune response, defense response, signaling pathways, and cell proliferation and apoptosis. Furthermore, this study provides new potential therapeutic markers and insights into the THP-1 host transcriptomic changes that occur after late infection with drug-resistant L. infantum clinical isolates.
Collapse
Affiliation(s)
| | - José Ignacio Manzano
- Instituto de Parasitología y Biomedicina "López-Neyra" (IPBLN-CSIC), Granada, Spain
| | - Ana Perea-Martínez
- Instituto de Parasitología y Biomedicina "López-Neyra" (IPBLN-CSIC), Granada, Spain
| | - Francisco Gamarro
- Instituto de Parasitología y Biomedicina "López-Neyra" (IPBLN-CSIC), Granada, Spain
| |
Collapse
|
6
|
Benitez R, Caro M, Andres-Leon E, O'Valle F, Delgado M. CORTISTATIN REGULATES FIBROSIS AND MYOFIBROBLAST ACTIVATION IN EXPERIMENTAL HEPATOTOXIC- AND CHOLESTATIC-INDUCED LIVER INJURY. Br J Pharmacol 2021; 179:2275-2296. [PMID: 34821378 DOI: 10.1111/bph.15752] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 10/08/2021] [Accepted: 11/08/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND AND PURPOSE Liver fibrosis induced by chronic hepatic injury remains as a major cause of morbidity and mortality worldwide. Identification of susceptibility/prognosis factors and new therapeutic tools for treating hepatic fibrotic disorders are urgent medical needs. Cortistatin is a neuropeptide with potent anti-inflammatory and anti-fibrotic activities in lung that binds to receptors that are expressed in liver fibroblasts and hepatic stellate cells. We evaluated the capacity of cortistatin to regulate liver fibrosis. EXPERIMENTAL APPROACH We experimentally induced liver fibrosis in mice by chronic CCl4 exposition and bile duct ligation and evaluated the histopathological signs and fibrotic markers. KEY RESULTS Hepatic expression of cortistatin inversely correlated with liver fibrosis grade in mice and humans with hepatic disorders. Cortistatin-deficient mice showed exacerbated signs of liver damage and fibrosis and increased mortality rates when challenged to hepatotoxic and cholestatic injury. Compared to wild-type mice, non-parenchymal liver cells isolated from cortistatin-deficient mice showed increased presence of cells with activated myofibroblast phenotypes and a differential genetic signature that is indicative of activated hepatic stellate cells and periportal fibroblasts and of myofibroblasts with active contractile apparatus. Cortistatin treatment reversed in vivo and in vitro these exaggerated fibrogenic phenotypes and protected from progression to severe liver fibrosis in response to hepatic injury. CONCLUSION AND IMPLICATIONS We identify cortistatin as an endogenous molecular break of liver fibrosis and its deficiency as a potential poor-prognosis marker for chronic hepatic disorders that course with fibrosis. Cortistatin-based therapies emerge as attractive strategies for ameliorating severe hepatic fibrosis of various etiologies.
Collapse
Affiliation(s)
- Raquel Benitez
- Institute of Parasitology and Biomedicine Lopez-Neyra, PT Salud, Granada, Spain
| | - Marta Caro
- Institute of Parasitology and Biomedicine Lopez-Neyra, PT Salud, Granada, Spain
| | - Eduardo Andres-Leon
- Institute of Parasitology and Biomedicine Lopez-Neyra, PT Salud, Granada, Spain
| | - Francisco O'Valle
- Dept. of Pathology, School of Medicine, IBIMER and IBS-Granada, University of Granada, Spain
| | - Mario Delgado
- Institute of Parasitology and Biomedicine Lopez-Neyra, PT Salud, Granada, Spain
| |
Collapse
|
7
|
Fernández-Cortés M, Andrés-León E, Oliver FJ. The PARP Inhibitor Olaparib Modulates the Transcriptional Regulatory Networks of Long Non-Coding RNAs during Vasculogenic Mimicry. Cells 2020; 9:cells9122690. [PMID: 33333852 PMCID: PMC7765283 DOI: 10.3390/cells9122690] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/09/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022] Open
Abstract
In highly metastatic tumors, vasculogenic mimicry (VM) involves the acquisition by tumor cells of endothelial-like traits. Poly-(ADP-ribose) polymerase (PARP) inhibitors are currently used against tumors displaying BRCA1/2-dependent deficient homologous recombination, and they may have antimetastatic activity. Long non-coding RNAs (lncRNAs) are emerging as key species-specific regulators of cellular and disease processes. To evaluate the impact of olaparib treatment in the context of non-coding RNA, we have analyzed the expression of lncRNA after performing unbiased whole-transcriptome profiling of human uveal melanoma cells cultured to form VM. RNAseq revealed that the non-coding transcriptomic landscape differed between olaparib-treated and non-treated cells: olaparib significantly modulated the expression of 20 lncRNAs, 11 lncRNAs being upregulated, and 9 downregulated. We subjected the data to different bioinformatics tools and analysis in public databases. We found that copy-number variation alterations in some olaparib-modulated lncRNAs had a statistically significant correlation with alterations in some key tumor suppressor genes. Furthermore, the lncRNAs that were modulated by olaparib appeared to be regulated by common transcription factors: ETS1 had high-score binding sites in the promoters of all olaparib upregulated lncRNAs, while MZF1, RHOXF1 and NR2C2 had high-score binding sites in the promoters of all olaparib downregulated lncRNAs. Finally, we predicted that olaparib-modulated lncRNAs could further regulate several transcription factors and their subsequent target genes in melanoma, suggesting that olaparib may trigger a major shift in gene expression mediated by the regulation lncRNA. Globally, olaparib changed the lncRNA expression landscape during VM affecting angiogenesis-related genes.
Collapse
|
8
|
Zhou A, Xie S, Feng Y, Sun D, Liu S, Sun Z, Li M, Zhang C, Zou J. Insights Into the Albinism Mechanism for Two Distinct Color Morphs of Northern Snakehead, Channa argus Through Histological and Transcriptome Analyses. Front Genet 2020; 11:830. [PMID: 33193565 PMCID: PMC7530302 DOI: 10.3389/fgene.2020.00830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 07/09/2020] [Indexed: 12/20/2022] Open
Abstract
The great northern snakehead (Channa argus) is one of the most important economic and conservational fish in China. In this study, the melanocytes in the skin of two distinct color morphs C. argus were investigated and compared through employment of the microscopic analysis, hematoxylin and eosin (H&E) and Masson Fontana staining. Our results demonstrated the uneven distribution of melanocytes with extremely low density and most of them were in the state of aging or death. Meanwhile, there was no obvious pigment layer and melanocytes distribution pattern found in the albino-type (AT), while the melanocytes were evenly distributed with abundance in the bicolor-type (BT). The transcriptome analysis through Illumina HiSeq sequencing showed that a total of 34.93 Gb Clean Data was obtained, and Q30 base percentage reached 92.66%. The BT and AT northern snakeheads transcriptome data included a total of 56,039,701 and 60,410,063 clean reads (n = 3), respectively. In gene expression analyses, the sample correlation coefficients (r) were ranged between 0.92 and 1.00; the contribution of PC1 and PC2 were 50.25 and 13.73% by using PCA cluster analysis, the total number of DEGs were 1024 (559 up-regulated and 465 down-regulated), and the number of annotated DEGs was 767 (COG 172, KEGG 262, GO 288, SwissProt 548, Pfam 579 and NR 765). Additionally, 46,363 ± 873 and 44,947 ± 392 single nucleotide polymorphisms (SNPs) were compiled via genetic structure analysis, respectively. Ten key pigment-related genes were screened using qRT-PCR. And all of them revealed extremely higher expression levels in the skin of BT than those of AT. This is the first study to analyze the mechanism of albino characteristics of Channa via histology and transcriptomics, and also provide the oretical and practical support for the protection and development of germplasm resources for C. argus.
Collapse
Affiliation(s)
- Aiguo Zhou
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou, China
| | - Shaolin Xie
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou, China
| | - Yongyong Feng
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China
| | - Di Sun
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China
| | - Shulin Liu
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China
| | - Zhuolin Sun
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China
| | - Mingzhi Li
- Independent Researcher, Guangzhou, China
| | - Chaonan Zhang
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China
| | - Jixing Zou
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, Guangzhou, China.,Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou, China
| |
Collapse
|
9
|
Panina Y, Karagiannis P, Kurtz A, Stacey GN, Fujibuchi W. Human Cell Atlas and cell-type authentication for regenerative medicine. Exp Mol Med 2020; 52:1443-1451. [PMID: 32929224 PMCID: PMC8080834 DOI: 10.1038/s12276-020-0421-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/22/2022] Open
Abstract
In modern biology, the correct identification of cell types is required for the developmental study of tissues and organs and the production of functional cells for cell therapies and disease modeling. For decades, cell types have been defined on the basis of morphological and physiological markers and, more recently, immunological markers and molecular properties. Recent advances in single-cell RNA sequencing have opened new doors for the characterization of cells at the individual and spatiotemporal levels on the basis of their RNA profiles, vastly transforming our understanding of cell types. The objective of this review is to survey the current progress in the field of cell-type identification, starting with the Human Cell Atlas project, which aims to sequence every cell in the human body, to molecular marker databases for individual cell types and other sources that address cell-type identification for regenerative medicine based on cell data guidelines.
Collapse
Affiliation(s)
- Yulia Panina
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Peter Karagiannis
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan
| | - Andreas Kurtz
- BIH Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Glyn N Stacey
- International Stem Cell Banking Initiative, 2 High Street, Barley, Herts, SG88HZ, UK
- National Stem Cell Resource Centre, Institute of Zoology, Chinese Academy of Sciences, 100190, Beijing, China
- Innovation Academy for Stem Cell and Regeneration, Chinese Academy of Sciences, 100101, Beijing, China
| | - Wataru Fujibuchi
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto, 606-8507, Japan.
| |
Collapse
|
10
|
Machado FB, Moharana KC, Almeida-Silva F, Gazara RK, Pedrosa-Silva F, Coelho FS, Grativol C, Venancio TM. Systematic analysis of 1298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1894-1909. [PMID: 32445587 DOI: 10.1111/tpj.14850] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 04/15/2020] [Accepted: 05/06/2020] [Indexed: 05/23/2023]
Abstract
Soybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past 5 years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions and genotypes. In this study, we have collected data from 1298 publicly available soybean transcriptome samples, processed the raw sequencing reads and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52 737/56 044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70 963 of 74 490) of the assembled transcripts have intron chains exactly matching those from known transcripts, whereas 3256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.
Collapse
Affiliation(s)
- Fabricio B Machado
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Kanhu C Moharana
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fabricio Almeida-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Rajesh K Gazara
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Francisnei Pedrosa-Silva
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Fernanda S Coelho
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Clícia Grativol
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil
| |
Collapse
|
11
|
Li Q, Noel-MacDonnell JR, Koestler DC, Goode EL, Fridley BL. Subject level clustering using a negative binomial model for small transcriptomic studies. BMC Bioinformatics 2018; 19:474. [PMID: 30541426 PMCID: PMC6292049 DOI: 10.1186/s12859-018-2556-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 12/03/2018] [Indexed: 02/06/2023] Open
Abstract
Background Unsupervised clustering represents one of the most widely applied methods in analysis of high-throughput ‘omics data. A variety of unsupervised model-based or parametric clustering methods and non-parametric clustering methods have been proposed for RNA-seq count data, most of which perform well for large samples, e.g. N ≥ 500. A common issue when analyzing limited samples of RNA-seq count data is that the data follows an over-dispersed distribution, and thus a Negative Binomial likelihood model is often used. Thus, we have developed a Negative Binomial model-based (NBMB) clustering approach for application to RNA-seq studies. Results We have developed a Negative Binomial Model-Based (NBMB) method to cluster samples using a stochastic version of the expectation-maximization algorithm. A simulation study involving various scenarios was completed to compare the performance of NBMB to Gaussian model-based or Gaussian mixture modeling (GMM). NBMB was also applied for the clustering of two RNA-seq studies; type 2 diabetes study (N = 96) and TCGA study of ovarian cancer (N = 295). Simulation results showed that NBMB outperforms GMM applied with different transformations in majority of scenarios with limited sample size. Additionally, we found that NBMB outperformed GMM for small clusters distance regardless of sample size. Increasing total number of genes with fixed proportion of differentially expressed genes does not change the outperformance of NBMB, but improves the overall performance of GMM. Analysis of type 2 diabetes and ovarian cancer tumor data with NBMB found good agreement with the reported disease subtypes and the gene expression patterns. This method is available in an R package on CRAN named NB.MClust. Conclusion Use of Negative Binomial model based clustering is advisable when clustering over dispersed RNA-seq count data. Electronic supplementary material The online version of this article (10.1186/s12859-018-2556-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qian Li
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL, 33612, USA.,Health Informatics Institute, University of South Florida, Tampa, FL, USA
| | | | - Devin C Koestler
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS, USA
| | - Ellen L Goode
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL, 33612, USA.
| |
Collapse
|
12
|
Behrisch M, Schreck T, Krüger R, Gehlenborg N, Lekschas F, Pfister H. Visual Pattern-Driven Exploration of Big Data. 2018 INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL AND IMMERSIVE ANALYTICS (BDVA) : KONSTANZ, GERMANY, OCTOBER 17 -19, 2018. IEEE INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL AND IMMERSIVE ANALYTICS (4TH : 2018 : KONSTANZ, GERMANY) 2018; 2018. [PMID: 31396383 DOI: 10.1109/bdva.2018.8534028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and complexity also the number of patterns increases, leaving the analyst with a vast result space. Current algorithmic and especially visualization approaches often fail to answer central overview questions essential for a comprehensive understanding of pattern distributions and support, their quality, and relevance to the analysis task. To address these challenges, we contribute a visual analytics pipeline targeted on the pattern-driven exploration of result spaces in a semi-automatic fashion. Specifically, we combine image feature analysis and unsupervised learning to partition the pattern space into interpretable, coherent chunks, which should be given priority in a subsequent in-depth analysis. In our analysis scenarios, no ground-truth is given. Thus, we employ and evaluate novel quality metrics derived from the distance distributions of our image feature vectors and the derived cluster model to guide the feature selection process. We visualize our results interactively, allowing the user to drill down from overview to detail into the pattern space and demonstrate our techniques in two case studies on Earth observation and biomedical genomic data.
Collapse
|
13
|
Rigaill G, Balzergue S, Brunaud V, Blondet E, Rau A, Rogier O, Caius J, Maugis-Rabusseau C, Soubigou-Taconnat L, Aubourg S, Lurin C, Martin-Magniette ML, Delannoy E. Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis. Brief Bioinform 2018; 19:65-76. [PMID: 27742662 DOI: 10.1093/bib/bbw092] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Indexed: 12/16/2022] Open
Abstract
Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets.
Collapse
|
14
|
Collord G, Tarpey P, Kurbatova N, Martincorena I, Moran S, Castro M, Nagy T, Bignell G, Maura F, Young MD, Berna J, Tubio JMC, McMurran CE, Young AMH, Sanders M, Noorani I, Price SJ, Watts C, Leipnitz E, Kirsch M, Schackert G, Pearson D, Devadass A, Ram Z, Collins VP, Allinson K, Jenkinson MD, Zakaria R, Syed K, Hanemann CO, Dunn J, McDermott MW, Kirollos RW, Vassiliou GS, Esteller M, Behjati S, Brazma A, Santarius T, McDermott U. An integrated genomic analysis of anaplastic meningioma identifies prognostic molecular signatures. Sci Rep 2018; 8:13537. [PMID: 30202034 PMCID: PMC6131140 DOI: 10.1038/s41598-018-31659-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 08/16/2018] [Indexed: 12/21/2022] Open
Abstract
Anaplastic meningioma is a rare and aggressive brain tumor characterised by intractable recurrences and dismal outcomes. Here, we present an integrated analysis of the whole genome, transcriptome and methylation profiles of primary and recurrent anaplastic meningioma. A key finding was the delineation of distinct molecular subgroups that were associated with diametrically opposed survival outcomes. Relative to lower grade meningiomas, anaplastic tumors harbored frequent driver mutations in SWI/SNF complex genes, which were confined to the poor prognosis subgroup. Aggressive disease was further characterised by transcriptional evidence of increased PRC2 activity, stemness and epithelial-to-mesenchymal transition. Our analyses discern biologically distinct variants of anaplastic meningioma with prognostic and therapeutic significance.
Collapse
Affiliation(s)
- Grace Collord
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Patrick Tarpey
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Natalja Kurbatova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Inigo Martincorena
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Sebastian Moran
- Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Catalonia, Spain
| | - Manuel Castro
- Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Catalonia, Spain
| | - Tibor Nagy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Graham Bignell
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Francesco Maura
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
- Department of Hematology, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Matthew D Young
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Jorge Berna
- Mobile Genomes and Disease, Molecular Medicine and Chronic diseases Centre (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, 15706, Spain
| | - Jose M C Tubio
- Mobile Genomes and Disease, Molecular Medicine and Chronic diseases Centre (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, 15706, Spain
| | - Chris E McMurran
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK
| | - Adam M H Young
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK
| | - Mathijs Sanders
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Erasmus University Medical Center, Department of Hematology, Rotterdam, The Netherlands
| | - Imran Noorani
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK
| | - Stephen J Price
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK
| | - Colin Watts
- Department of Neurosurgery, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | - Elke Leipnitz
- Klinik und Poliklink für Neurochirurgie, "Carl Gustav Carus" Universitätsklinikum, Technische Universität Dresden, Fetscherstrasse 74, 01307, Dresden, Germany
| | - Matthias Kirsch
- Klinik und Poliklink für Neurochirurgie, "Carl Gustav Carus" Universitätsklinikum, Technische Universität Dresden, Fetscherstrasse 74, 01307, Dresden, Germany
| | - Gabriele Schackert
- Klinik und Poliklink für Neurochirurgie, "Carl Gustav Carus" Universitätsklinikum, Technische Universität Dresden, Fetscherstrasse 74, 01307, Dresden, Germany
| | - Danita Pearson
- Department of Pathology, Cambridge University Hospital, CB2 0QQ, Cambridge, UK
| | - Abel Devadass
- Department of Pathology, Cambridge University Hospital, CB2 0QQ, Cambridge, UK
| | - Zvi Ram
- Department of Neurosurgery, Tel-Aviv Medical Center, Tel-Aviv, Israel
| | - V Peter Collins
- Department of Pathology, Cambridge University Hospital, CB2 0QQ, Cambridge, UK
| | - Kieren Allinson
- Department of Pathology, Cambridge University Hospital, CB2 0QQ, Cambridge, UK
| | - Michael D Jenkinson
- Department of Neurosurgery, The Walton Centre, Liverpool, L9 7LJ, UK
- Institute of Translational Medicine, University of Liverpool, Liverpool, L9 7LJ, UK
| | - Rasheed Zakaria
- Department of Neurosurgery, The Walton Centre, Liverpool, L9 7LJ, UK
- Institute of Integrative Biology, University of Liverpool, Liverpool, L9 7LJ, UK
| | - Khaja Syed
- Department of Neurosurgery, The Walton Centre, Liverpool, L9 7LJ, UK
- Institute of Integrative Biology, University of Liverpool, Liverpool, L9 7LJ, UK
| | - C Oliver Hanemann
- Institute of Translational and Stratified Medicine, Plymouth University Peninsula Schools of Medicine and Dentistry, Plymouth University, Plymouth, Devon, PL4 8AA, UK
| | - Jemma Dunn
- Institute of Translational and Stratified Medicine, Plymouth University Peninsula Schools of Medicine and Dentistry, Plymouth University, Plymouth, Devon, PL4 8AA, UK
| | - Michael W McDermott
- Department of Neurosurgery, UCSF Medical Center, San Francisco, CA, 94143-0112, USA
| | - Ramez W Kirollos
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK
| | - George S Vassiliou
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Haematology, Cambridge University Hospitals NHS Trust, Cambridge, CB2 0QQ, UK
| | - Manel Esteller
- Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Catalonia, Spain
- Physiological Sciences Department, School of Medicine and Health Sciences, University of Barcelona (UB), Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| | - Sam Behjati
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Thomas Santarius
- Department of Neurosurgery, Department of Clinical Neuroscience, Cambridge University Hospitals NHS Foundation Trust, Cambridge, CB2 0QQ, UK.
| | - Ultan McDermott
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- Institute of Translational Medicine, University of Liverpool, Liverpool, L9 7LJ, UK.
- AstraZeneca, CRUK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK.
| |
Collapse
|
15
|
Endara MJ, Coley PD, Wiggins NL, Forrister DL, Younkin GC, Nicholls JA, Pennington RT, Dexter KG, Kidner CA, Stone GN, Kursar TA. Chemocoding as an identification tool where morphological- and DNA-based methods fall short: Inga as a case study. THE NEW PHYTOLOGIST 2018; 218:847-858. [PMID: 29436716 DOI: 10.1111/nph.15020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 01/04/2018] [Indexed: 05/12/2023]
Abstract
The need for species identification and taxonomic discovery has led to the development of innovative technologies for large-scale plant identification. DNA barcoding has been useful, but fails to distinguish among many species in species-rich plant genera, particularly in tropical regions. Here, we show that chemical fingerprinting, or 'chemocoding', has great potential for plant identification in challenging tropical biomes. Using untargeted metabolomics in combination with multivariate analysis, we constructed species-level fingerprints, which we define as chemocoding. We evaluated the utility of chemocoding with species that were defined morphologically and subject to next-generation DNA sequencing in the diverse and recently radiated neotropical genus Inga (Leguminosae), both at single study sites and across broad geographic scales. Our results show that chemocoding is a robust method for distinguishing morphologically similar species at a single site and for identifying widespread species across continental-scale ranges. Given that species are the fundamental unit of analysis for conservation and biodiversity research, the development of accurate identification methods is essential. We suggest that chemocoding will be a valuable additional source of data for a quick identification of plants, especially for groups where other methods fall short.
Collapse
Affiliation(s)
- María-José Endara
- Department of Biology, University of Utah, Salt Lake City, UT, 84112-0840, USA
- Centro de Investigación de la Biodiversidad y Cambio Climático (BioCamb) e Ingeniería en Biodiversidad y Recursos Genéticos, Facultad de Ciencias de Medio Ambiente, Universidad Tecnológica Indoamérica, Quito, EC170103, Ecuador
| | - Phyllis D Coley
- Department of Biology, University of Utah, Salt Lake City, UT, 84112-0840, USA
- Smithsonian Tropical Research Institute, Box 0843-03092, Balboa, Ancón, Republic of Panamá
| | - Natasha L Wiggins
- School of Biological Sciences, University of Tasmania, Sandy Bay, TAS, 7001, Australia
| | - Dale L Forrister
- Department of Biology, University of Utah, Salt Lake City, UT, 84112-0840, USA
| | - Gordon C Younkin
- Department of Biology, University of Utah, Salt Lake City, UT, 84112-0840, USA
| | - James A Nicholls
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JY, UK
| | | | - Kyle G Dexter
- Royal Botanic Garden Edinburgh, Edinburgh, EH3 5LR, UK
- School of GeoSciences, University of Edinburgh, Edinburgh, EH9 3FF, UK
| | - Catherine A Kidner
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JY, UK
- Royal Botanic Garden Edinburgh, Edinburgh, EH3 5LR, UK
| | - Graham N Stone
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JY, UK
| | - Thomas A Kursar
- Department of Biology, University of Utah, Salt Lake City, UT, 84112-0840, USA
- Smithsonian Tropical Research Institute, Box 0843-03092, Balboa, Ancón, Republic of Panamá
| |
Collapse
|
16
|
Zhao S, Sun J, Shimizu K, Kadota K. Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results. Biol Proced Online 2018; 20:5. [PMID: 29507534 PMCID: PMC5831220 DOI: 10.1186/s12575-018-0067-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/12/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hierarchical Sample clustering (HSC) is widely performed to examine associations within expression data obtained from microarrays and RNA sequencing (RNA-seq). Researchers have investigated the HSC results with several possible criteria for grouping (e.g., sex, age, and disease types). However, the evaluation of arbitrary defined groups still counts in subjective visual inspection. RESULTS To objectively evaluate the degree of separation between groups of interest in the HSC dendrogram, we propose to use Silhouette scores. Silhouettes was originally developed as a graphical aid for the validation of data clusters. It provides a measure of how well a sample is classified when it was assigned to a cluster by according to both the tightness of the clusters and the separation between them. It ranges from 1.0 to - 1.0, and a larger value for the average silhouette (AS) over all samples to be analyzed indicates a higher degree of cluster separation. The basic idea to use an AS is to replace the term cluster by group when calculating the scores. We investigated the validity of this score using simulated and real data designed for differential expression (DE) analysis. We found that larger (or smaller) AS values agreed well with both higher (or lower) degrees of separation between different groups and higher percentages of differentially expressed genes (PDEG). We also found that the AS values were generally independent on the number of replicates (Nrep). Although the PDEG values depended on Nrep, we confirmed that both AS and PDEG values were close to zero when samples in the data showed an intermingled nature between the groups in the HSC dendrogram. CONCLUSION Silhouettes is useful for exploring data with predefined group labels. It would help provide both an objective evaluation of HSC dendrograms and insights into the DE results with regard to the compared groups.
Collapse
Affiliation(s)
- Shitao Zhao
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Jianqiang Sun
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| |
Collapse
|
17
|
Nascimento M, Silva FFE, Sáfadi T, Nascimento ACC, Ferreira TEM, Barroso LMA, Ferreira Azevedo C, Guimarães SEF, Serão NVL. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One 2017; 12:e0181195. [PMID: 28715507 PMCID: PMC5513449 DOI: 10.1371/journal.pone.0181195] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 06/27/2017] [Indexed: 11/19/2022] Open
Abstract
Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.
Collapse
Affiliation(s)
- Moysés Nascimento
- Department of Statistics, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Thelma Sáfadi
- Department of Exact Sciences, Federal University of Lavras, Lavras, Minas Gerais, Brazil
| | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.
Collapse
Affiliation(s)
- Lawrence B Holder
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA
| | - M Muksitul Haque
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA.,b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| | - Michael K Skinner
- b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| |
Collapse
|