1
|
Cakiroglu E, Senturk S. Genomics and Functional Genomics of Malignant Pleural Mesothelioma. Int J Mol Sci 2020; 21:ijms21176342. [PMID: 32882916 PMCID: PMC7504302 DOI: 10.3390/ijms21176342] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/20/2020] [Accepted: 08/20/2020] [Indexed: 12/17/2022] Open
Abstract
Malignant pleural mesothelioma (MPM) is a rare, aggressive cancer of the mesothelial cells lining the pleural surface of the chest wall and lung. The etiology of MPM is strongly associated with prior exposure to asbestos fibers, and the median survival rate of the diagnosed patients is approximately one year. Despite the latest advancements in surgical techniques and systemic therapies, currently available treatment modalities of MPM fail to provide long-term survival. The increasing incidence of MPM highlights the need for finding effective treatments. Targeted therapies offer personalized treatments in many cancers. However, targeted therapy in MPM is not recommended by clinical guidelines mainly because of poor target definition. A better understanding of the molecular and cellular mechanisms and the predictors of poor clinical outcomes of MPM is required to identify novel targets and develop precise and effective treatments. Recent advances in the genomics and functional genomics fields have provided groundbreaking insights into the genomic and molecular profiles of MPM and enabled the functional characterization of the genetic alterations. This review provides a comprehensive overview of the relevant literature and highlights the potential of state-of-the-art genomics and functional genomics research to facilitate the development of novel diagnostics and therapeutic modalities in MPM.
Collapse
Affiliation(s)
- Ece Cakiroglu
- Izmir Biomedicine and Genome Center, Izmir 35340, Turkey;
- Department of Genome Sciences and Molecular Biotechnology, Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir 35340, Turkey
| | - Serif Senturk
- Izmir Biomedicine and Genome Center, Izmir 35340, Turkey;
- Department of Genome Sciences and Molecular Biotechnology, Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir 35340, Turkey
- Correspondence:
| |
Collapse
|
2
|
Lekschas F, Gehlenborg N. SATORI: a system for ontology-guided visual exploration of biomedical data repositories. Bioinformatics 2018; 34:1200-1207. [PMID: 29186292 PMCID: PMC6031061 DOI: 10.1093/bioinformatics/btx739] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 11/22/2017] [Indexed: 01/14/2023] Open
Abstract
Motivation The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. Results We developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. Availability and implementation SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fritz Lekschas
- Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, MA 02138, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
3
|
Blomstedt P, Dutta R, Seth S, Brazma A, Kaski S. Modelling-based experiment retrieval: a case study with gene expression clustering. Bioinformatics 2016; 32:1388-94. [PMID: 26740526 DOI: 10.1093/bioinformatics/btv762] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 12/28/2015] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case versus control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. RESULTS We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. k-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. AVAILABILITY AND IMPLEMENTATION The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages. CONTACT paul.blomstedt@aalto.fi or samuel.kaski@aalto.fi SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Blomstedt
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Ritabrata Dutta
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Sohan Seth
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, UK
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland and
| |
Collapse
|
4
|
Uziela K, Honkela A. Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability. PLoS One 2015; 10:e0126545. [PMID: 25966034 PMCID: PMC4429080 DOI: 10.1371/journal.pone.0126545] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 04/03/2015] [Indexed: 01/25/2023] Open
Abstract
Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.”
Collapse
Affiliation(s)
- Karolis Uziela
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, 17121 Solna, Sweden
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
5
|
Faisal A, Peltonen J, Georgii E, Rung J, Kaski S. Toward computational cumulative biology by combining models of biological datasets. PLoS One 2014; 9:e113053. [PMID: 25427176 PMCID: PMC4245117 DOI: 10.1371/journal.pone.0113053] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 10/17/2014] [Indexed: 11/21/2022] Open
Abstract
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
Collapse
Affiliation(s)
- Ali Faisal
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Jaakko Peltonen
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Elisabeth Georgii
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Johan Rung
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
6
|
Tun AW, Chaiyarit S, Kaewsutthi S, Katanyoo W, Chuenkongkaew W, Kuwano M, Tomonaga T, Peerapittayamongkol C, Thongboonkerd V, Lertrit P. Profiling the mitochondrial proteome of Leber's Hereditary Optic Neuropathy (LHON) in Thailand: down-regulation of bioenergetics and mitochondrial protein quality control pathways in fibroblasts with the 11778G>A mutation. PLoS One 2014; 9:e106779. [PMID: 25215595 PMCID: PMC4162555 DOI: 10.1371/journal.pone.0106779] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 08/08/2014] [Indexed: 12/24/2022] Open
Abstract
Leber's Hereditary Optic Neuropathy (LHON) is one of the commonest mitochondrial diseases. It causes total blindness, and predominantly affects young males. For the disease to develop, it is necessary for an individual to carry one of the primary mtDNA mutations 11778G>A, 14484T>C or 3460G>A. However these mutations are not sufficient to cause disease, and they do not explain the characteristic features of LHON such as the higher prevalence in males, incomplete penetrance, and relatively later age of onset. In order to explore the roles of nuclear encoded mitochondrial proteins in development of LHON, we applied a proteomic approach to samples from affected and unaffected individuals from 3 pedigrees and from 5 unrelated controls. Two-dimensional electrophoresis followed by MS/MS analysis in the mitochondrial lysate identified 17 proteins which were differentially expressed between LHON cases and unrelated controls, and 24 proteins which were differentially expressed between unaffected relatives and unrelated controls. The proteomic data were successfully validated by western blot analysis of 3 selected proteins. All of the proteins identified in the study were mitochondrial proteins and most of them were down regulated in 11778G>A mutant fibroblasts. These proteins included: subunits of OXPHOS enzyme complexes, proteins involved in intermediary metabolic processes, nucleoid related proteins, chaperones, cristae remodelling proteins and an anti-oxidant enzyme. The protein profiles of both the affected and unaffected 11778G>A carriers shared many features which differed from those of unrelated control group, revealing similar proteomic responses to 11778G>A mutation in both affected and unaffected individuals. Differentially expressed proteins revealed two broad groups: a cluster of bioenergetic pathway proteins and a cluster involved in protein quality control system. Defects in these systems are likely to impede the function of retinal ganglion cells, and may lead to the development of LHON in synergy with the primary mtDNA mutation.
Collapse
Affiliation(s)
- Aung Win Tun
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Sakdithep Chaiyarit
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Supannee Kaewsutthi
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Wanphen Katanyoo
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Wanicha Chuenkongkaew
- Department of Ophthalmology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Masayoshi Kuwano
- Laboratory of Proteome Research, National Institute of Biomedical Innovation, Osaka, Japan
| | - Takeshi Tomonaga
- Laboratory of Proteome Research, National Institute of Biomedical Innovation, Osaka, Japan
| | | | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
- Center for Research in Complex Systems Science, Mahidol University, Bangkok, Thailand
- * E-mail: (PL); (VT)
| | - Patcharee Lertrit
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
- * E-mail: (PL); (VT)
| |
Collapse
|
7
|
Seth S, Välimäki N, Kaski S, Honkela A. Exploration and retrieval of whole-metagenome sequencing samples. Bioinformatics 2014; 30:2471-9. [PMID: 24845653 PMCID: PMC4230234 DOI: 10.1093/bioinformatics/btu340] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation. RESULTS In this article, we develop a content-based exploration and retrieval method for whole-metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples. We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples. We observe significant enrichment for diseased gut samples in results of queries with another diseased sample and high accuracy in discriminating between different body sites even though the method is unsupervised. AVAILABILITY AND IMPLEMENTATION A software implementation of the DSM framework is available at https://github.com/HIITMetagenomics/dsm-framework. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sohan Seth
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Niko Välimäki
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
8
|
Abstract
Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
Collapse
|
9
|
Faisal A, Gillberg J, Leen G, Peltonen J. Transfer learning using a nonparametric sparse topic model. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.12.038] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
10
|
Georgii E, Salojärvi J, Brosché M, Kangasjärvi J, Kaski S. Targeted retrieval of gene expression measurements using regulatory models. Bioinformatics 2012; 28:2349-56. [DOI: 10.1093/bioinformatics/bts361] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
|