1
|
Tran A, Wang A, Mickaill J, Strbenac D, Larance M, Vernon ST, Grieve SM, Figtree GA, Patrick E, Yang JYH. Construction and optimization of multi-platform precision pathways for precision medicine. Sci Rep 2024; 14:4248. [PMID: 38378802 PMCID: PMC10879206 DOI: 10.1038/s41598-024-54517-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 02/13/2024] [Indexed: 02/22/2024] Open
Abstract
In the enduring challenge against disease, advancements in medical technology have empowered clinicians with novel diagnostic platforms. Whilst in some cases, a single test may provide a confident diagnosis, often additional tests are required. However, to strike a balance between diagnostic accuracy and cost-effectiveness, one must rigorously construct the clinical pathways. Here, we developed a framework to build multi-platform precision pathways in an automated, unbiased way, recommending the key steps a clinician would take to reach a diagnosis. We achieve this by developing a confidence score, used to simulate a clinical scenario, where at each stage, either a confident diagnosis is made, or another test is performed. Our framework provides a range of tools to interpret, visualize and compare the pathways, improving communication and enabling their evaluation on accuracy and cost, specific to different contexts. This framework will guide the development of novel diagnostic pathways for different diseases, accelerating the implementation of precision medicine into clinical practice.
Collapse
Affiliation(s)
- Andy Tran
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW, Australia
| | - Andy Wang
- Westmead Medical Institute, Westmead, NSW, Australia
| | - Jamie Mickaill
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, Australia
- School of Computer Science, The University of Sydney, Camperdown, NSW, Australia
| | - Dario Strbenac
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW, Australia
| | - Mark Larance
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
| | - Stephen T Vernon
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Kolling Institute of Medical Research, St Leonards, NSW, Australia
| | - Stuart M Grieve
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Department of Radiology, Royal Prince Alfred Hospital, Camperdown, Australia
| | - Gemma A Figtree
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Kolling Institute of Medical Research, St Leonards, NSW, Australia
| | - Ellis Patrick
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW, Australia.
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.
| |
Collapse
|
2
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
3
|
Zhang Y, Hu A, Lin Y, Cao Y, Muller S, Wong G, Yang JYH. simKAP: simulation framework for the kidney allocation process with decision making model. Sci Rep 2023; 13:16367. [PMID: 37773250 PMCID: PMC10541869 DOI: 10.1038/s41598-023-41162-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 08/23/2023] [Indexed: 10/01/2023] Open
Abstract
Organ shortage is a major barrier in transplantation and rules guarding organ allocation decisions should be robust, transparent, ethical and fair. Whilst numerous allocation strategies have been proposed, it is often unrealistic to evaluate all of them in real-life settings. Hence, the capability of conducting simulations prior to deployment is important. Here, we developed a kidney allocation simulation framework (simKAP) that aims to evaluate the allocation process and the complex clinical decision-making process of organ acceptance in kidney transplantation. Our findings have shown that incorporation of both the clinical decision-making and a dynamic wait-listing process resulted in the best agreement between the actual and simulated data in almost all scenarios. Additionally, several hypothetical risk-based allocation strategies were generated, and we found that these strategies improved recipients' long-term post-transplant patient survival and reduced wait time for transplantation. The importance of simKAP lies in its ability for policymakers in any transplant community to evaluate any proposed allocation algorithm using in-silico simulation.
Collapse
Affiliation(s)
- Yunwei Zhang
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| | - Anne Hu
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia
- Sydney Law School, The University of Sydney, Sydney, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| | - Yue Cao
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| | - Samuel Muller
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia
- School of Mathematical and Physical Sciences, Macquarie University, Sydney, NSW, Australia
| | - Germaine Wong
- Sydney School of Public Health, The University of Sydney, Sydney, NSW, Australia
- Centre for Kidney Research, Kids Research Institute, The Children's Hospital at Westmead, Sydney, NSW, Australia
- Centre for Transplant and Renal Research, Westmead Hospital, Sydney, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, F07- Carslaw Building, Sydney, NSW, Australia.
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
4
|
Zhang Y, Deng D, Muller S, Wong G, Yang JYH. A Multi-Step Precision Pathway for Predicting Allograft Survival in Heterogeneous Cohorts of Kidney Transplant Recipients. Transpl Int 2023; 36:11338. [PMID: 37767525 PMCID: PMC10520244 DOI: 10.3389/ti.2023.11338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/29/2023] [Indexed: 09/29/2023]
Abstract
Accurate prediction of allograft survival after kidney transplantation allows early identification of at-risk recipients for adverse outcomes and initiation of preventive interventions to optimize post-transplant care. Many prediction algorithms do not model cohort heterogeneity and may lead to inaccurate assessment of longer-term graft outcomes among minority groups. Using data from a national Australian kidney transplant cohort (2008-2017) as the derivation set, we developed P-Cube, a multi-step precision prediction pathway model for predicting overall graft survival in three ethnic subgroups: European Australians, Asian Australians and Aboriginal and Torres Strait Islander Peoples. The concordance index for the European Australians, Asian Australians, and Aboriginal and Torres Strait Islander Peoples subpopulations were 0.99 (0.98-0.99), 0.93 (0.92-0.94) and 0.92 (0.91-0.93), respectively. Similar findings were observed when validating P-cube using an external dataset [Scientific Registry of Transplant Recipient Registry (2006-2020)]. Six sub-categories of recipients with distinct risk factor profiles were identified. Some factors such as blood group compatibility were considered important across the entire transplant population. Other factors such as human leukocyte antigen (HLA)-DR mismatches were unique to older recipients. The P-cube model identifies allograft survival specific risk factors within a heterogenous population and offers personalized survival predictions in a diverse cohort.
Collapse
Affiliation(s)
- Yunwei Zhang
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Danny Deng
- Centre for Kidney Research, Kids Research Institute, The Children’s Hospital at Westmead, Sydney, NSW, Australia
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Samuel Muller
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- School of Mathematical and Physical Sciences, Macquarie University, Sydney, NSW, Australia
| | - Germaine Wong
- Centre for Kidney Research, Kids Research Institute, The Children’s Hospital at Westmead, Sydney, NSW, Australia
- Sydney School of Public Health, The University of Sydney, Sydney, NSW, Australia
- Centre for Transplant and Renal Research, Westmead Hospital, Sydney, NSW, Australia
| | - Jean Yee Hwa Yang
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Yu L, Liu C, Yang JYH, Yang P. Ensemble deep learning of embeddings for clustering multimodal single-cell omics data. Bioinformatics 2023; 39:btad382. [PMID: 37314966 PMCID: PMC10287920 DOI: 10.1093/bioinformatics/btad382] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/16/2023] [Accepted: 06/12/2023] [Indexed: 06/16/2023] Open
Abstract
MOTIVATION Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
Collapse
Affiliation(s)
- Lijia Yu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| |
Collapse
|
6
|
Cao Y, Lin Y, Patrick E, Yang P, Yang JYH. scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction. Bioinformatics 2022; 38:4745-4753. [PMID: 36040148 PMCID: PMC9563679 DOI: 10.1093/bioinformatics/btac590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/21/2022] [Accepted: 08/28/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarising a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Methods section. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yue Cao
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, Australia
| | - Yingxin Lin
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, Australia
| | - Ellis Patrick
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Westmead, NSW, Australia
| | - Pengyi Yang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Westmead, NSW, Australia
| | - Jean Yee Hwa Yang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| |
Collapse
|
7
|
Coorey G, Figtree GA, Fletcher DF, Snelson VJ, Vernon ST, Winlaw D, Grieve SM, McEwan A, Yang JYH, Qian P, O'Brien K, Orchard J, Kim J, Patel S, Redfern J. The health digital twin to tackle cardiovascular disease-a review of an emerging interdisciplinary field. NPJ Digit Med 2022; 5:126. [PMID: 36028526 PMCID: PMC9418270 DOI: 10.1038/s41746-022-00640-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Potential benefits of precision medicine in cardiovascular disease (CVD) include more accurate phenotyping of individual patients with the same condition or presentation, using multiple clinical, imaging, molecular and other variables to guide diagnosis and treatment. An approach to realising this potential is the digital twin concept, whereby a virtual representation of a patient is constructed and receives real-time updates of a range of data variables in order to predict disease and optimise treatment selection for the real-life patient. We explored the term digital twin, its defining concepts, the challenges as an emerging field, and potentially important applications in CVD. A mapping review was undertaken using a systematic search of peer-reviewed literature. Industry-based participants and patent applications were identified through web-based sources. Searches of Compendex, EMBASE, Medline, ProQuest and Scopus databases yielded 88 papers related to cardiovascular conditions (28%, n = 25), non-cardiovascular conditions (41%, n = 36), and general aspects of the health digital twin (31%, n = 27). Fifteen companies with a commercial interest in health digital twin or simulation modelling had products focused on CVD. The patent search identified 18 applications from 11 applicants, of which 73% were companies and 27% were universities. Three applicants had cardiac-related inventions. For CVD, digital twin research within industry and academia is recent, interdisciplinary, and established globally. Overall, the applications were numerical simulation models, although precursor models exist for the real-time cyber-physical system characteristic of a true digital twin. Implementation challenges include ethical constraints and clinical barriers to the adoption of decision tools derived from artificial intelligence systems.
Collapse
Affiliation(s)
- Genevieve Coorey
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia. .,The George Institute for Global Health, Sydney, NSW, Australia.
| | - Gemma A Figtree
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia.,Kolling Institute of Medical Research, Royal North Shore Hospital, Sydney, NSW, Australia
| | - David F Fletcher
- University of Sydney, School of Chemical and Biomolecular Engineering, Sydney, NSW, Australia
| | - Victoria J Snelson
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia.,University of Sydney, Charles Perkins Centre, Sydney, NSW, Australia
| | - Stephen Thomas Vernon
- Kolling Institute of Medical Research, Royal North Shore Hospital, Sydney, NSW, Australia.,Department of Cardiology, Royal North Shore Hospital, Sydney, NSW, Australia
| | - David Winlaw
- Cincinnati Children's Hospital Medical Cente, Cincinnati, OH, USA
| | - Stuart M Grieve
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia.,University of Sydney, Charles Perkins Centre, Sydney, NSW, Australia
| | - Alistair McEwan
- The University of Sydney, School of Biomedical Engineering, Sydney, NSW, Australia
| | - Jean Yee Hwa Yang
- University of Sydney, Charles Perkins Centre, Sydney, NSW, Australia
| | - Pierre Qian
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia.,Westmead Applied Research Centre, Westmead Hospital, Sydney, NSW, Australia
| | - Kieran O'Brien
- Siemens Healthcare Pty Ltd; and Centre for Advanced Imaging, University of Queensland, Brisbane, QLD, Australia
| | - Jessica Orchard
- University of Sydney, Charles Perkins Centre, Sydney, NSW, Australia
| | - Jinman Kim
- University of Sydney, School of Computer Science, Sydney, NSW, Australia
| | - Sanjay Patel
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia.,Royal Prince Alfred Hospital, Sydney, NSW, Australia.,Heart Research Institute, Sydney, NSW, Australia
| | - Julie Redfern
- University of Sydney, Faculty of Medicine and Health, Sydney, NSW, Australia
| |
Collapse
|
8
|
Abstract
Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking. We develop a comprehensive evaluation framework, SimBench, including a kernel density estimation measure to benchmark 12 simulation methods through 35 scRNA-seq experimental datasets. We evaluate the simulation methods on a panel of data properties, ability to maintain biological signals, scalability and applicability. Our benchmark uncovers performance differences among the methods and highlights the varying difficulties in simulating data characteristics. Furthermore, we identify several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development.
Collapse
Affiliation(s)
- Yue Cao
- Charles Perkins Centre, The University of Sydney, Sydney, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia
| | - Pengyi Yang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, Westmead, NSW, Australia.
| | - Jean Yee Hwa Yang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.
| |
Collapse
|
9
|
Zhang Y, Wong G, Yang JYH. Trash or Treasure: Rescuing Discard Kidneys. Transplantation 2021; 105:1914-1915. [PMID: 33534532 DOI: 10.1097/tp.0000000000003663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Yunwei Zhang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
| | - Germaine Wong
- Sydney School of Public Health, University of Sydney, Sydney, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
10
|
Byrne M, Koop D, Strbenac D, Cisternas P, Balogh R, Yang JYH, Davidson PL, Wray G. Transcriptomic analysis of sea star development through metamorphosis to the highly derived pentameral body plan with a focus on neural transcription factors. DNA Res 2021; 27:5825731. [PMID: 32339242 PMCID: PMC7315356 DOI: 10.1093/dnares/dsaa007] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 04/20/2020] [Indexed: 12/13/2022] Open
Abstract
The Echinodermata is characterized by a secondarily evolved pentameral body plan. While the evolutionary origin of this body plan has been the subject of debate, the molecular mechanisms underlying its development are poorly understood. We assembled a de novo developmental transcriptome from the embryo through metamorphosis in the sea star Parvulastra exigua. We use the asteroid model as it represents the basal-type echinoderm body architecture. Global variation in gene expression distinguished the gastrula profile and showed that metamorphic and juvenile stages were more similar to each other than to the pre-metamorphic stages, pointing to the marked changes that occur during metamorphosis. Differential expression and gene ontology (GO) analyses revealed dynamic changes in gene expression throughout development and the transition to pentamery. Many GO terms enriched during late metamorphosis were related to neurogenesis and signalling. Neural transcription factor genes exhibited clusters with distinct expression patterns. A suite of these genes was up-regulated during metamorphosis (e.g. Pax6, Eya, Hey, NeuroD, FoxD, Mbx, and Otp). In situ hybridization showed expression of neural genes in the CNS and sensory structures. Our results provide a foundation to understand the metamorphic transition in echinoderms and the genes involved in development and evolution of pentamery.
Collapse
Affiliation(s)
- Maria Byrne
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia.,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Demian Koop
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Dario Strbenac
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Paula Cisternas
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Regina Balogh
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | | | - Gregory Wray
- Department of Biology, Duke University, Durham, NC 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
11
|
Byrne M, Koop D, Strbenac D, Cisternas P, Yang JYH, Davidson PL, Wray G. Transcriptomic analysis of Nodal - and BMP- associated genes during development to the juvenile seastar in Parvulastra exigua (Asterinidae). Mar Genomics 2021; 59:100857. [PMID: 33676872 DOI: 10.1016/j.margen.2021.100857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 10/22/2022]
Abstract
The molecular mechanisms underlying development of the pentameral body of adult echinoderms are poorly understood but are important to solve with respect to evolution of a unique body plan that contrasts with the bilateral body plan of other deuterostomes. As Nodal and BMP2/4 signalling is involved in axis formation in larvae and development of the echinoderm body plan, we used the developmental transcriptome generated for the asterinid seastar Parvulastra exigua to investigate the temporal expression patterns of Nodal and BMP2/4 genes from the embryo and across metamorphosis to the juvenile. For echinoderms, the Asteroidea represents the basal-type body architecture with a distinct (separated) ray structure. Parvulastra exigua has lecithotrophic development forming the juvenile soon after gastrulation providing ready access to the developing adult stage. We identified 39 genes associated with the Nodal and BMP2/4 network in the P. exigua developmental transcriptome. Clustering analysis of these genes resulted in 6 clusters with similar temporal expression patterns across development. A co-expression analysis revealed genes that have similar expression profiles as Nodal and BMP2/4. These results indicated genes that may have a regulatory relationship in patterning morphogenesis of the juvenile seastar. Developmental RNA-seq analyses of Parvulastra exigua show changes in Nodal and BMP2/4 signalling genes across the metamorphic transition. We provide the foundation for detailed analyses of this cascade in the evolution of the unusual pentameral echinoderm body and its deuterostome affinities.
Collapse
Affiliation(s)
- Maria Byrne
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia.
| | - Demian Koop
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Dario Strbenac
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Paula Cisternas
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Phillip L Davidson
- Department of Biology and Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Gregory Wray
- Department of Biology and Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
12
|
Kim HJ, Lin Y, Geddes TA, Yang JYH, Yang P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics 2021; 36:4137-4143. [PMID: 32353146 DOI: 10.1093/bioinformatics/btaa282] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/20/2020] [Accepted: 04/23/2020] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand-receptor interaction analysis and interactive web-based visualization of CITE-seq data. RESULTS We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand-receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. AVAILABILITY AND IMPLEMENTATION CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. CONTACT pengyi.yang@sydney.edu.au. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hani Jieun Kim
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney 2006, Australia.,Charles Perkins Centre, The University of Sydney, Sydney 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Sydney 2145, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney 2006, Australia.,Charles Perkins Centre, The University of Sydney, Sydney 2006, Australia
| | - Thomas A Geddes
- Charles Perkins Centre, The University of Sydney, Sydney 2006, Australia.,School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney 2006, Australia.,Charles Perkins Centre, The University of Sydney, Sydney 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney 2006, Australia.,Charles Perkins Centre, The University of Sydney, Sydney 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Sydney 2145, Australia
| |
Collapse
|
13
|
Coorey CP, Sharma A, Muller S, Yang JYH. Prediction modeling-part 2: using machine learning strategies to improve transplantation outcomes. Kidney Int 2020; 99:817-823. [PMID: 32916179 DOI: 10.1016/j.kint.2020.08.026] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 08/01/2020] [Accepted: 08/10/2020] [Indexed: 11/29/2022]
Abstract
Kidney transplant recipients and transplant physicians face important clinical questions where machine learning methods may help improve the decision-making process. This mini-review explores potential applications of machine learning methods to key stages of a kidney transplant recipient's journey, from initial waitlisting and donor selection, to personalization of immunosuppression and prediction of post-transplantation events. Both unsupervised and supervised machine learning methods are presented, including k-means clustering, principal components analysis, k-nearest neighbors, and random forests. The various challenges of these approaches are also discussed.
Collapse
Affiliation(s)
- Craig Peter Coorey
- Centre for Kidney Research, Children's Hospital at Westmead, Westmead, New South Wales, Australia; Liverpool Hospital, South Western Sydney Clinical School, University of New South Wales and Western Sydney University, Sydney, New South Wales, Australia.
| | - Ankit Sharma
- Centre for Kidney Research, Children's Hospital at Westmead, Westmead, New South Wales, Australia; Department of Renal Medicine, Westmead Hospital, Westmead, New South Wales, Australia; School of Public Health, Sydney Medical School, The University of Sydney, Sydney, New South Wales, Australia
| | - Samuel Muller
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia; Department of Mathematics and Statistics, Macquarie University, New South Wales, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia; Charles Perkins Center, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
14
|
Kim T, Chen IR, Lin Y, Wang AYY, Yang JYH, Yang P. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief Bioinform 2020; 20:2316-2326. [PMID: 30137247 DOI: 10.1093/bib/bby076] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/01/2018] [Accepted: 08/02/2018] [Indexed: 12/16/2022] Open
Abstract
Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.
Collapse
Affiliation(s)
- Taiyun Kim
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Irene Rui Chen
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Andy Yi-Yang Wang
- Department of Anaesthesia, The University of Sydney Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
15
|
|
16
|
Ghazanfar S, Lin Y, Su X, Lin DM, Patrick E, Han ZG, Marioni JC, Yang JYH. Investigating higher-order interactions in single-cell data with scHOT. Nat Methods 2020; 17:799-806. [PMID: 32661426 PMCID: PMC7610653 DOI: 10.1038/s41592-020-0885-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 06/03/2020] [Indexed: 12/12/2022]
Abstract
Single-cell genomics has transformed our ability to examine cell fate choice. Examining cells along a computationally ordered 'pseudotime' offers the potential to unpick subtle changes in variability and covariation among key genes. We describe an approach, scHOT-single-cell higher-order testing-which provides a flexible and statistically robust framework for identifying changes in higher-order interactions among genes. scHOT can be applied for cells along a continuous trajectory or across space and accommodates various higher-order measurements including variability or correlation. We demonstrate the use of scHOT by studying coordinated changes in higher-order interactions during embryonic development of the mouse liver. Additionally, scHOT identifies subtle changes in gene-gene correlations across space using spatially resolved transcriptomics data from the mouse olfactory bulb. scHOT meaningfully adds to first-order differential expression testing and provides a framework for interrogating higher-order interactions using single-cell data.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia
| | - Xianbin Su
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - David Ming Lin
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, USA
| | - Ellis Patrick
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia
- Westmead Institute for Medical Research, Westmead, New South Wales, Australia
| | - Ze-Guang Han
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia.
| |
Collapse
|
17
|
Lin Y, Cao Y, Kim HJ, Salim A, Speed TP, Lin DM, Yang P, Yang JYH. scClassify: sample size estimation and multiscale classification of cells using single and multiple reference. Mol Syst Biol 2020; 16:e9389. [PMID: 32567229 PMCID: PMC7306901 DOI: 10.15252/msb.20199389] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 05/22/2020] [Accepted: 05/26/2020] [Indexed: 12/26/2022] Open
Abstract
Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and StatisticsUniversity of SydneySydneyNSWAustralia
- Charles Perkins CentreUniversity of SydneySydneyNSWAustralia
| | - Yue Cao
- School of Mathematics and StatisticsUniversity of SydneySydneyNSWAustralia
- Charles Perkins CentreUniversity of SydneySydneyNSWAustralia
| | - Hani Jieun Kim
- School of Mathematics and StatisticsUniversity of SydneySydneyNSWAustralia
- Charles Perkins CentreUniversity of SydneySydneyNSWAustralia
- Computational Systems Biology GroupChildren's Medical Research InstituteUniversity of SydneyWestmeadNSWAustralia
| | - Agus Salim
- Department of Mathematics and StatisticsLa Trobe UniversityBundooraVICAustralia
- Baker Heart and Diabetes InstituteMelbourneVICAustralia
- Bioinformatics DivisionWalter and Eliza Hall Institute of Medical ResearchParkvilleVICAustralia
| | - Terence P Speed
- Bioinformatics DivisionWalter and Eliza Hall Institute of Medical ResearchParkvilleVICAustralia
| | - David M Lin
- Department of Biomedical SciencesCornell UniversityIthacaNYUSA
| | - Pengyi Yang
- School of Mathematics and StatisticsUniversity of SydneySydneyNSWAustralia
- Charles Perkins CentreUniversity of SydneySydneyNSWAustralia
- Computational Systems Biology GroupChildren's Medical Research InstituteUniversity of SydneyWestmeadNSWAustralia
| | - Jean Yee Hwa Yang
- School of Mathematics and StatisticsUniversity of SydneySydneyNSWAustralia
- Charles Perkins CentreUniversity of SydneySydneyNSWAustralia
| |
Collapse
|
18
|
Abstract
Background Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. Results Here, we propose a semi-supervised learning framework, named scReClassify, for ‘post hoc’ cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, we demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types. Conclusions scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify
Collapse
Affiliation(s)
- Taiyun Kim
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Kitty Lo
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Thomas A Geddes
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia.,School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, 2006, NSW, Australia
| | - Hani Jieun Kim
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 2145, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia. .,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 2145, NSW, Australia. .,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia.
| |
Collapse
|
19
|
Kim T, Chen IR, Parker BL, Humphrey SJ, Crossett B, Cordwell SJ, Yang P, Yang JYH. QCMAP: An Interactive Web-Tool for Performance Diagnosis and Prediction of LC-MS Systems. Proteomics 2019; 19:e1900068. [PMID: 31099962 DOI: 10.1002/pmic.201900068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Indexed: 01/04/2023]
Abstract
The increasing role played by liquid chromatography-mass spectrometry (LC-MS)-based proteomics in biological discovery has led to a growing need for quality control (QC) on the LC-MS systems. While numerous quality control tools have been developed to track the performance of LC-MS systems based on a pre-defined set of performance factors (e.g., mass error, retention time), the precise influence and contribution of the performance factors and their generalization property to different biological samples are not as well characterized. Here, a web-based application (QCMAP) is developed for interactive diagnosis and prediction of the performance of LC-MS systems across different biological sample types. Leveraging on a standardized HeLa cell sample run as QC within a multi-user facility, predictive models are trained on a panel of commonly used performance factors to pinpoint the precise conditions to a (un)satisfactory performance in three LC-MS systems. It is demonstrated that the learned model can be applied to predict LC-MS system performance for brain samples generated from an independent study. By compiling these predictive models into our web-application, QCMAP allows users to benchmark the performance of their LC-MS systems using their own samples and identify key factors for instrument optimization. QCMAP is freely available from: http://shiny.maths.usyd.edu.au/QCMAP/.
Collapse
Affiliation(s)
- Taiyun Kim
- School of Mathematics and Statistics, University of Sydney, NSW, 2006, Australia.,Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, NSW, 2006, Australia
| | - Irene Rui Chen
- School of Mathematics and Statistics, University of Sydney, NSW, 2006, Australia.,Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, NSW, 2006, Australia
| | - Benjamin L Parker
- School of Life and Environmental Sciences, University of Sydney, NSW, 2006, Australia
| | - Sean J Humphrey
- School of Life and Environmental Sciences, University of Sydney, NSW, 2006, Australia
| | - Ben Crossett
- Sydney Mass Spectrometry, University of Sydney, NSW, 2006, Australia
| | - Stuart J Cordwell
- School of Life and Environmental Sciences, University of Sydney, NSW, 2006, Australia.,Sydney Mass Spectrometry, University of Sydney, NSW, 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, NSW, 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, University of Sydney, Westmead, NSW, 2145, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, NSW, 2006, Australia.,Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, NSW, 2006, Australia
| |
Collapse
|
20
|
Yang P, Oldfield A, Kim T, Yang A, Yang JYH, Ho JWK. Integrative analysis identifies co-dependent gene expression regulation of BRG1 and CHD7 at distal regulatory sites in embryonic stem cells. Bioinformatics 2018; 33:1916-1920. [PMID: 28203701 DOI: 10.1093/bioinformatics/btx092] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/08/2017] [Indexed: 12/11/2022] Open
Abstract
Motivation DNA binding proteins such as chromatin remodellers, transcription factors (TFs), histone modifiers and co-factors often bind cooperatively to activate or repress their target genes in a cell type-specific manner. Nonetheless, the precise role of cooperative binding in defining cell-type identity is still largely uncharacterized. Results Here, we collected and analyzed 214 public datasets representing chromatin immunoprecipitation followed by sequencing (ChIP-Seq) of 104 DNA binding proteins in embryonic stem cell (ESC) lines. We classified their binding sites into those proximal to gene promoters and those in distal regions, and developed a web resource called Proximal And Distal (PAD) clustering to identify their co-localization at these respective regions. Using this extensive dataset, we discovered an extensive co-localization of BRG1 and CHD7 at distal but not proximal regions. The comparison of co-localization sites to those bound by either BRG1 or CHD7 alone showed an enrichment of ESC master TFs binding and active chromatin architecture at co-localization sites. Most notably, our analysis reveals the co-dependency of BRG1 and CHD7 at distal regions on regulating expression of their common target genes in ESC. This work sheds light on cooperative binding of TF binding proteins in regulating gene expression in ESC, and demonstrates the utility of integrative analysis of a manually curated compendium of genome-wide protein binding profiles in our online resource PAD. Availability and Implementation PAD is freely available at http://pad.victorchang.edu.au/ and its source code is available via an open source GPL 3.0 license at https://github.com/VCCRI/PAD/. Contact pengyi.yang@sydney.edu.au or j.ho@victorchang.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengyi Yang
- Charles Perkins Centre and School of Mathematics and Statistics, University of Sydney, Camperdown, NSW 2006, Australia.,Systems Biology Group, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, RTP, NC 27709, USA
| | - Andrew Oldfield
- Systems Biology Group, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, RTP, NC 27709, USA.,Institute of Human Genetics, CNRS UPR 1142, Montpellier, France
| | | | | | - Jean Yee Hwa Yang
- Charles Perkins Centre and School of Mathematics and Statistics, University of Sydney, Camperdown, NSW 2006, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia
| |
Collapse
|
21
|
Chaudhuri R, Krycer JR, Fazakerley DJ, Fisher-Wellman KH, Su Z, Hoehn KL, Yang JYH, Kuncic Z, Vafaee F, James DE. The transcriptional response to oxidative stress is part of, but not sufficient for, insulin resistance in adipocytes. Sci Rep 2018; 8:1774. [PMID: 29379070 PMCID: PMC5789081 DOI: 10.1038/s41598-018-20104-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 01/12/2018] [Indexed: 02/06/2023] Open
Abstract
Insulin resistance is a major risk factor for metabolic diseases such as Type 2 diabetes. Although the underlying mechanisms of insulin resistance remain elusive, oxidative stress is a unifying driver by which numerous extrinsic signals and cellular stresses trigger insulin resistance. Consequently, we sought to understand the cellular response to oxidative stress and its role in insulin resistance. Using cultured 3T3-L1 adipocytes, we established a model of physiologically-derived oxidative stress by inhibiting the cycling of glutathione and thioredoxin, which induced insulin resistance as measured by impaired insulin-stimulated 2-deoxyglucose uptake. Using time-resolved transcriptomics, we found > 2000 genes differentially-expressed over 24 hours, with specific metabolic and signalling pathways enriched at different times. We explored this coordination using a knowledge-based hierarchical-clustering approach to generate a temporal transcriptional cascade and identify key transcription factors responding to oxidative stress. This response shared many similarities with changes observed in distinct insulin resistance models. However, an anti-oxidant reversed insulin resistance phenotypically but not transcriptionally, implying that the transcriptional response to oxidative stress is insufficient for insulin resistance. This suggests that the primary site by which oxidative stress impairs insulin action occurs post-transcriptionally, warranting a multi-level ‘trans-omic’ approach when studying time-resolved responses to cellular perturbations.
Collapse
Affiliation(s)
- Rima Chaudhuri
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James R Krycer
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Daniel J Fazakerley
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | | | - Zhiduan Su
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Kyle L Hoehn
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, 2052, Australia
| | - Jean Yee Hwa Yang
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Zdenka Kuncic
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.,School of Physics and Australian Institute for Nanoscale Science and Technology, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, 2052, Australia.
| | - David E James
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia. .,School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia. .,Sydney Medical School, The University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
22
|
Abstract
Protein post-translational modifications (PTMs) are crucial for signal transduction in cells. In order to understand key cell signaling events, identification of functionally important PTMs, which are more likely to be evolutionarily conserved, is necessary. In recent times, high-throughput mass spectrometry (MS) has made quantitative datasets in diverse species readily available, which has led to a growing need for tools to facilitate cross-species comparison of PTM data. Cross-species comparison of PTM sites is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on previously annotated orthologous phosphosites and do not enable cross-species mapping of newly identified modification sites. Here, we describe an automated web-based tool, PhosphOrtholog, that accurately maps annotated and novel orthologous PTM sites from high-throughput MS-based experimental data obtained from different species without relying on existing PTM databases. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of evolutionarily conserved and functional PTM sites that influence most biological processes. In this Chapter, we illustrate with examples how to use PhosphOrtholog to map novel PTM sites from cross-species MS-based phosphoproteomics data.
Collapse
Affiliation(s)
- Rima Chaudhuri
- Charles Perkins Centre, School of Life and Environmental Sciences, University of Sydney, Camperdown, NSW, 2006, Australia.
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, 2006, Australia
| |
Collapse
|
23
|
Ghazanfar S, Yang JYH. Characterizing mutation–expression network relationships in multiple cancers. Comput Biol Chem 2016; 63:73-82. [DOI: 10.1016/j.compbiolchem.2016.02.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
|
24
|
Yang P, Patrick E, Humphrey SJ, Ghazanfar S, James DE, Jothi R, Yang JYH. KinasePA: Phosphoproteomics data annotation using hypothesis driven kinase perturbation analysis. Proteomics 2016; 16:1868-71. [PMID: 27145998 DOI: 10.1002/pmic.201600068] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2016] [Revised: 03/27/2016] [Accepted: 05/02/2016] [Indexed: 12/13/2022]
Abstract
Mass spectrometry (MS)-based quantitative phosphoproteomics has become a key approach for proteome-wide profiling of phosphorylation in tissues and cells. Traditional experimental design often compares a single treatment with a control, whereas increasingly more experiments are designed to compare multiple treatments with respect to a control. To this end, the development of bioinformatic tools that can integrate multiple treatments and visualise kinases and substrates under combinatorial perturbations is vital for dissecting concordant and/or independent effects of each treatment. Here, we propose a hypothesis driven kinase perturbation analysis (KinasePA) to annotate and visualise kinases and their substrates that are perturbed by various combinatorial effects of treatments in phosphoproteomics experiments. We demonstrate the utility of KinasePA through its application to two large-scale phosphoproteomics datasets and show its effectiveness in dissecting kinases and substrates within signalling pathways driven by unique combinations of cellular stimuli and inhibitors. We implemented and incorporated KinasePA as part of the "directPA" R package available from the comprehensive R archive network (CRAN). Furthermore, KinasePA also has an interactive web interface that can be readily applied to annotate user provided phosphoproteomics data (http://kinasepa.pengyiyang.org).
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.,Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Sydney, NSW, Australia.,Systems Biology Section, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental, Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Ellis Patrick
- Brigham and Women's Hospital, Harvard Medical School, Broad Institute, Boston, MA, USA
| | - Sean J Humphrey
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Sydney, NSW, Australia.,Department of Proteomics and Signal Transduction, Max Planck Institute for Biochemistry, Martinsried, Germany
| | - Shila Ghazanfar
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| | - David E James
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Sydney, NSW, Australia
| | - Raja Jothi
- Systems Biology Section, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental, Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
25
|
Byrne M, Koop D, Cisternas P, Strbenac D, Yang JYH, Wray GA. Transcriptomic analysis of Nodal- and BMP-associated genes during juvenile development of the sea urchin Heliocidaris erythrogramma. Mar Genomics 2015; 24 Pt 1:41-5. [DOI: 10.1016/j.margen.2015.05.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Revised: 05/30/2015] [Accepted: 05/30/2015] [Indexed: 10/23/2022]
|
26
|
Chaudhuri R, Khoo PS, Tonks K, Junutula JR, Kolumam G, Modrusan Z, Samocha-Bonet D, Meoli CC, Hocking S, Fazakerley DJ, Stöckli J, Hoehn KL, Greenfield JR, Yang JYH, James DE. Cross-species gene expression analysis identifies a novel set of genes implicated in human insulin sensitivity. NPJ Syst Biol Appl 2015; 1:15010. [PMID: 28725461 PMCID: PMC5516867 DOI: 10.1038/npjsba.2015.10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Revised: 07/24/2015] [Accepted: 08/24/2015] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVE Insulin resistance (IR) is one of the earliest predictors of type 2 diabetes. However, diagnosis of IR is limited. High fat fed mouse models provide key insights into IR. We hypothesized that early features of IR are associated with persistent changes in gene expression (GE) and endeavored to (a) develop novel methods for improving signal:noise in analysis of human GE using mouse models; (b) identify a GE motif that accurately diagnoses IR in humans; and (c) identify novel biology associated with IR in humans. METHODS We integrated human muscle GE data with longitudinal mouse GE data and developed an unbiased three-level cross-species analysis platform (single gene, gene set, and networks) to generate a gene expression motif (GEM) indicative of IR. A logistic regression classification model validated GEM in three independent human data sets (n=115). RESULTS This GEM of 93 genes substantially improved diagnosis of IR compared with routine clinical measures across multiple independent data sets. Individuals misclassified by GEM possessed other metabolic features raising the possibility that they represent a separate metabolic subclass. The GEM was enriched in pathways previously implicated in insulin action and revealed novel associations between β-catenin and Jak1 and IR. Functional analyses using small molecule inhibitors showed an important role for these proteins in insulin action. CONCLUSIONS This study shows that systems approaches for identifying molecular signatures provides a powerful way to stratify individuals into discrete metabolic groups. Moreover, we speculate that the β-catenin pathway may represent a novel biomarker for IR in humans that warrant future investigation.
Collapse
Affiliation(s)
- Rima Chaudhuri
- Charles Perkins Centre, School of Molecular Bioscience, The University of Sydney, Sydney, NSW, Australia.,Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Poh Sim Khoo
- Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Katherine Tonks
- Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,Department of Endocrinology and Diabetes Centre, St Vincent's Hospital, Sydney, NSW, Australia
| | | | | | - Zora Modrusan
- Genentech Incorporated, South San Francisco, CA, USA
| | - Dorit Samocha-Bonet
- Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
| | - Christopher C Meoli
- Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Samantha Hocking
- Department of Endocrinology, Royal North Shore Hospital, Sydney, NSW, Australia.,School of Medicine, The University of Sydney, Sydney, NSW, Australia
| | - Daniel J Fazakerley
- Charles Perkins Centre, School of Molecular Bioscience, The University of Sydney, Sydney, NSW, Australia.,Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Jacqueline Stöckli
- Charles Perkins Centre, School of Molecular Bioscience, The University of Sydney, Sydney, NSW, Australia.,Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Kyle L Hoehn
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Jerry R Greenfield
- Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,Department of Endocrinology and Diabetes Centre, St Vincent's Hospital, Sydney, NSW, Australia.,Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
| | - David E James
- Charles Perkins Centre, School of Molecular Bioscience, The University of Sydney, Sydney, NSW, Australia.,Diabetes and Obesity Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.,School of Medicine, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
27
|
Chaudhuri R, Sadrieh A, Hoffman NJ, Parker BL, Humphrey SJ, Stöckli J, Hill AP, James DE, Yang JYH. PhosphOrtholog: a web-based tool for cross-species mapping of orthologous protein post-translational modifications. BMC Genomics 2015; 16:617. [PMID: 26283093 PMCID: PMC4539857 DOI: 10.1186/s12864-015-1820-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 08/05/2015] [Indexed: 01/27/2023] Open
Abstract
Background Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog (www.phosphortholog.com) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community. Results Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites in all our example data sets by more than double when compared to those recovered using existing resources such as PhosphoSitePlus. Conclusions PhosphOrtholog is the first tool that enables mapping of thousands of novel and known protein phosphorylation sites across species, accessible through an easy-to-use web interface. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of functional PTM sites. Moreover, PhosphOrtholog is generic being applicable to other PTM datasets such as acetylation, ubiquitination and methylation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1820-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rima Chaudhuri
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Camperdown, NSW, 2006, Australia. .,Diabetes and Obesity Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia.
| | - Arash Sadrieh
- Lowy Packer Building, Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, 2010, Australia.
| | - Nolan J Hoffman
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Camperdown, NSW, 2006, Australia. .,Diabetes and Obesity Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia.
| | - Benjamin L Parker
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Camperdown, NSW, 2006, Australia. .,Diabetes and Obesity Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia.
| | - Sean J Humphrey
- Department of Proteomics and Signal Transduction, Max Planck Institute for Biochemistry, Martinsried, Germany.
| | - Jacqueline Stöckli
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Camperdown, NSW, 2006, Australia. .,Diabetes and Obesity Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia.
| | - Adam P Hill
- Lowy Packer Building, Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, 2010, Australia.
| | - David E James
- Charles Perkins Centre, School of Molecular Biosciences, University of Sydney, Camperdown, NSW, 2006, Australia. .,Diabetes and Obesity Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia.
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, 2006, Australia.
| |
Collapse
|
28
|
Jayawardana K, Schramm SJ, Haydu L, Thompson JF, Scolyer RA, Mann GJ, Müller S, Yang JYH. Determination of prognosis in metastatic melanoma through integration of clinico-pathologic, mutation, mRNA, microRNA, and protein information. Int J Cancer 2014; 136:863-74. [DOI: 10.1002/ijc.29047] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 06/06/2014] [Indexed: 01/19/2023]
Affiliation(s)
- Kaushala Jayawardana
- School of Mathematics & Statistics; The University of Sydney; Sydney NSW Australia
| | - Sarah-Jane Schramm
- Sydney Medical School; The University of Sydney at Westmead Millennium Institute for Medical Research; Westmead NSW Australia
- Melanoma Institute Australia; Sydney NSW Australia
| | - Lauren Haydu
- Melanoma Institute Australia; Sydney NSW Australia
| | - John F. Thompson
- Melanoma Institute Australia; Sydney NSW Australia
- Discipline of Surgery; The University of Sydney; Sydney NSW Australia
| | - Richard A. Scolyer
- Melanoma Institute Australia; Sydney NSW Australia
- Discipline of Pathology; The University of Sydney; Sydney NSW Australia
- Tissue Pathology and Diagnostic Oncology; Royal Prince Alfred Hospital; Camperdown NSW Australia
| | - Graham J. Mann
- Sydney Medical School; The University of Sydney at Westmead Millennium Institute for Medical Research; Westmead NSW Australia
- Melanoma Institute Australia; Sydney NSW Australia
| | - Samuel Müller
- School of Mathematics & Statistics; The University of Sydney; Sydney NSW Australia
| | - Jean Yee Hwa Yang
- School of Mathematics & Statistics; The University of Sydney; Sydney NSW Australia
| |
Collapse
|
29
|
O'Connor KS, Parnell G, Patrick E, Ahlenstiel G, Suppiah V, van der Poorten D, Read SA, Leung R, Douglas MW, Yang JYH, Stewart GJ, Liddle C, George J, Booth DR. Hepatic metallothionein expression in chronic hepatitis C virus infection is IFNL3 genotype-dependent. Genes Immun 2014; 15:88-94. [PMID: 24335707 DOI: 10.1038/gene.2013.66] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 11/11/2013] [Accepted: 11/12/2013] [Indexed: 01/14/2023]
Abstract
The IFNL3 genotype predicts the clearance of hepatitis C virus (HCV), spontaneously and with interferon (IFN)-based therapy. The responder genotype is associated with lower expression of interferon stimulated genes (ISGs) in liver biopsies from chronic hepatitis C patients. However, ISGs represent many interacting molecular pathways, and we hypothesised that the IFNL3 genotype may produce a characteristic pattern of ISG expression explaining the effect of genotype on viral clearance. For the first time, we identified an association between a cluster of ISGs, the metallothioneins (MTs) and IFNL3 genotype. Importantly, MTs were significantly upregulated (in contrast to most other ISGs) in HCV-infected liver biopsies of rs8099917 responders. An association between lower fibrosis scores and higher MT levels was demonstrated underlying clinical relevance of this association. As expected, overall ISGs were significantly downregulated in biopsies from subjects with the IFNL3 rs8099917 responder genotype (P=2.38 × 10(-7)). Peripheral blood analysis revealed paradoxical and not previously described findings with upregulation of ISGs seen in the responder genotype (P=1.00 × 10(-4)). The higher MT expression in responders may contribute to their improved viral clearance and MT-inducing agents may be useful adjuncts to therapy for HCV. Upregulation of immune cell ISGs in responders may also contribute to the IFNL3 genotype effect.
Collapse
Affiliation(s)
- K S O'Connor
- Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia
| | - G Parnell
- Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia
| | - E Patrick
- Department of Mathematics, University of Sydney, Sydney, New South Wales, Australia
| | - G Ahlenstiel
- Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - V Suppiah
- 1] Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia [2] Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - D van der Poorten
- Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - S A Read
- 1] Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia [2] Centre for Infectious Diseases and Microbiology, Sydney Emerging infections and Biosecurity Institute, University of Sydney and Westmead Hospital, Sydney, New South Wales, Australia
| | - R Leung
- 1] Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia [2] Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - M W Douglas
- 1] Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia [2] Centre for Infectious Diseases and Microbiology, Sydney Emerging infections and Biosecurity Institute, University of Sydney and Westmead Hospital, Sydney, New South Wales, Australia
| | - J Y H Yang
- Department of Mathematics, University of Sydney, Sydney, New South Wales, Australia
| | - G J Stewart
- Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia
| | - C Liddle
- Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - J George
- Storr Liver Unit, Westmead Millennium Institute and Westmead Hospital, University of Sydney, Sydney, New South Wales, Australia
| | - D R Booth
- Institute for Immunology and Allergy Research, Westmead Millennium Institute, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
30
|
Tonks KT, Ng Y, Miller S, Coster ACF, Samocha-Bonet D, Iseli TJ, Xu A, Patrick E, Yang JYH, Junutula JR, Modrusan Z, Kolumam G, Stöckli J, Chisholm DJ, James DE, Greenfield JR. Impaired Akt phosphorylation in insulin-resistant human muscle is accompanied by selective and heterogeneous downstream defects. Diabetologia 2013; 56:875-85. [PMID: 23344726 DOI: 10.1007/s00125-012-2811-y] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 11/29/2012] [Indexed: 01/04/2023]
Abstract
AIMS/HYPOTHESIS Muscle insulin resistance, one of the earliest defects associated with type 2 diabetes, involves changes in the phosphoinositide 3-kinase/Akt network. The relative contribution of obesity vs insulin resistance to perturbations in this pathway is poorly understood. METHODS We used phosphospecific antibodies against targets in the Akt signalling network to study insulin action in muscle from lean, overweight/obese and type 2 diabetic individuals before and during a hyperinsulinaemic-euglycaemic clamp. RESULTS Insulin-stimulated Akt phosphorylation at Thr309 and Ser474 was highly correlated with whole-body insulin sensitivity. In contrast, impaired phosphorylation of Akt substrate of 160 kDa (AS160; also known as TBC1D4) was associated with adiposity, but not insulin sensitivity. Neither insulin sensitivity nor obesity was associated with defective insulin-dependent phosphorylation of forkhead box O (FOXO) transcription factor. In view of the resultant basal hyperinsulinaemia, we predicted that this selective response within the Akt pathway might lead to hyperactivation of those processes that were spared. Indeed, the expression of genes targeted by FOXO was downregulated in insulin-resistant individuals. CONCLUSIONS/INTERPRETATION These results highlight non-linearity in Akt signalling and suggest that: (1) the pathway from Akt to glucose transport is complex; and (2) pathways, particularly FOXO, that are not insulin-resistant, are likely to be hyperactivated in response to hyperinsulinaemia. This facet of Akt signalling may contribute to multiple features of the metabolic syndrome.
Collapse
Affiliation(s)
- K T Tonks
- Diabetes and Obesity Research Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, 2010 NSW, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Peters T, Bulger DW, Loi TH, Yang JYH, Ma D. Two-step cross-entropy feature selection for microarrays—power through complementarity. IEEE/ACM Trans Comput Biol Bioinform 2011; 8:1148-1151. [PMID: 21321369 DOI: 10.1109/tcbb.2011.30] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Current feature selection methods for supervised classification of tissue samples from microarray data generally fail to exploit complementary discriminatory power that can be found in sets of features. Using a feature selection method with the computational architecture of the cross-entropy method, including an additional preliminary step ensuring a lower bound on the number of times any feature is considered, we show when testing on a human lymph node data set that there are a significant number of genes that perform well when their complementary power is assessed, but “pass under the radar” of popular feature selection methods that only assess genes individually on a given classification tool. We also show that this phenomenon becomes more apparent as diagnostic specificity of the tissue samples analysed increases.
Collapse
Affiliation(s)
- Tim Peters
- Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia.
| | | | | | | | | |
Collapse
|
32
|
Abstract
MOTIVATION Mass spectrometry (MS)-based proteomics is one of the most commonly used research techniques for identifying and characterizing proteins in biological and medical research. The identification of a protein is the critical first step in elucidating its biological function. Successful protein identification depends on various interrelated factors, including effective analysis of MS data generated in a proteomic experiment. This analysis comprises several stages, often combined in a pipeline or workflow. The first component of the analysis is known as spectra pre-processing. In this component, the raw data generated by the mass spectrometer is processed to eliminate noise and identify the mass-to-charge ratio (m/z) and intensity for the peaks in the spectrum corresponding to the presence of certain peptides or peptide fragments. Since all downstream analyses depend on the pre-processed data, effective pre-processing is critical to protein identification and characterization. There is a critical need for more robust pre-processing algorithms that perform well on tandem mass spectra under a variety of different conditions and can be easily integrated into sophisticated data analysis pipelines for practical wet-lab applications. RESULT We have developed a new pre-processing algorithm. Based on wavelet theory, our method uses a dynamic peak model to identify peaks. It is designed to be easily integrated into a complete proteomic analysis workflow. We compared the method with other available algorithms using a reference library of raw MS and tandem MS spectra with known protein composition information. Our pre-processing algorithm results in the identification of significantly more peptides and proteins in the downstream analysis for a given false discovery rate. AVAILABILITY Software available at: http://www.maths.usyd.edu.au/u/penghao/index.html.
Collapse
Affiliation(s)
- Penghao Wang
- School of Mathematics and Statistics, University of Sydney, Sydney, Australia.
| | | | | | | |
Collapse
|
33
|
Reijo Pera RA, DeJonge C, Bossert N, Yao M, Hwa Yang JY, Asadi NB, Wong W, Wong C, Firpo MT. Gene expression profiles of human inner cell mass cells and embryonic stem cells. Differentiation 2009; 78:18-23. [DOI: 10.1016/j.diff.2009.03.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 03/07/2009] [Accepted: 03/12/2009] [Indexed: 10/20/2022]
|
34
|
Yang JYH. Microarrays--planning your experiment. Methods Mol Med 2008; 141:71-85. [PMID: 18453085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The rapid increase in the use of microarray studies has generated many questions on how to plan and design experiments that will effectively utilize this technology. Investigators often require answers to questions relating to microarray platforms, RNA samples, options for replication, allocation of samples to arrays, sample sizes, appropriate downstream analysis, and many others. Careful consideration of these issues is critical to ensure the efficiency and reliability of the actual microarray experiments, and will assist in enhancing interpretability of the experimental results.
Collapse
Affiliation(s)
- Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, New South Wales, Australia
| |
Collapse
|
35
|
Lewis CC, Yang JYH, Huang X, Banerjee SK, Blackburn MR, Baluk P, McDonald DM, Blackwell TS, Nagabhushanam V, Peters W, Voehringer D, Erle DJ. Disease-specific gene expression profiling in multiple models of lung disease. Am J Respir Crit Care Med 2007; 177:376-87. [PMID: 18029791 DOI: 10.1164/rccm.200702-333oc] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
RATIONALE Microarray technology is widely employed for studying the molecular mechanisms underlying complex diseases. However, analyses of individual diseases or models of diseases frequently yield extensive lists of differentially expressed genes with uncertain relationships to disease pathogenesis. OBJECTIVES To compare gene expression changes in a heterogeneous set of lung disease models in order to identify common gene expression changes seen in diverse forms of lung pathology, as well as relatively small subsets of genes likely to be involved in specific pathophysiological processes. METHODS We profiled lung gene expression in 12 mouse models of infection, allergy, and lung injury. A linear model was used to estimate transcript expression changes for each model, and hierarchical clustering was used to compare expression patterns between models. Selected expression changes were verified by quantitative polymerase chain reaction. MEASUREMENTS AND MAIN RESULTS A total of 24 transcripts, including many involved in inflammation and immune activation, were differentially expressed in a substantial majority (9 or more) of the models. Expression patterns distinguished three groups of models: (1) bacterial infection (n = 5), with changes in 89 transcripts, including many related to nuclear factor-kappaB signaling, cytokines, chemokines, and their receptors; (2) bleomycin-induced diseases (n = 2), with changes in 53 transcripts, including many related to matrix remodeling and Wnt signaling; and (3) T helper cell type 2 (allergic) inflammation (n = 5), with changes in 26 transcripts, including many encoding epithelial secreted molecules, ion channels, and transporters. CONCLUSIONS This multimodel dataset highlights novel genes likely involved in various pathophysiological processes and will be a valuable resource for the investigation of molecular mechanisms underlying lung disease pathogenesis.
Collapse
Affiliation(s)
- Christina C Lewis
- Cincinnati Children's Hospital Medical Center/Division of Immunobiology, 3333 Burnet Avenue, MLC 7038, Cincinnati, OH 45229, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Barker CS, Griffin C, Dolganov GM, Hanspers K, Yang JYH, Erle DJ. Increased DNA microarray hybridization specificity using sscDNA targets. BMC Genomics 2005; 6:57. [PMID: 15847692 PMCID: PMC1090574 DOI: 10.1186/1471-2164-6-57] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2005] [Accepted: 04/22/2005] [Indexed: 11/13/2022] Open
Abstract
Background The most widely used amplification method for microarray analysis of gene expression uses T7 RNA polymerase-driven in vitro transcription (IVT) to produce complementary RNA (cRNA) that can be hybridized to arrays. However, multiple rounds of amplification are required when assaying very small amounts of starting RNA. Moreover, certain cRNA-DNA mismatches are more stable than the analogous cDNA-DNA mismatches and this might increase non-specific hybridization. We sought to determine whether a recently developed linear isothermal amplification method (ribo-SPIA) that produces single stranded cDNA would offer advantages over traditional IVT-based methods for microarray-based analyses of transcript expression. Results A single round of ribo-SPIA amplification produced sufficient sscDNA for hybridizations when as little as 5 ng of starting total RNA was used. Comparisons of probe set signal intensities obtained from replicate amplifications showed consistently high correlations (r = 0.99). We compared gene expression in two different human RNA samples using ribo-SPIA. Compared with one round IVT, ribo-SPIA had a larger dynamic range and correlated better with quantitative PCR results even though we used 1000-fold less starting RNA. The improved dynamic range was associated with decreases in hybridization to mismatch control probes. Conclusion The use of amplified sscDNA may offer substantial advantages over IVT-based amplification methods, especially when very limited amounts of starting RNA are available. The use of sscDNA targets instead of cRNA targets appears to improve hybridization specificity.
Collapse
Affiliation(s)
- Christopher S Barker
- Gladstone Institute of Cardiovascular Disease, The J. David Gladstone Institutes, San Francisco, California 94158, USA
- San Francisco General Hospital General Clinical Research Center, University of California, San Francisco, San Francisco, California 94143, USA
| | - Chandi Griffin
- San Francisco General Hospital General Clinical Research Center, University of California, San Francisco, San Francisco, California 94143, USA
| | - Gregory M Dolganov
- Department of Medicine, University of California, San Francisco, San Francisco California 94143, USA
| | - Kristina Hanspers
- Gladstone Institute of Cardiovascular Disease, The J. David Gladstone Institutes, San Francisco, California 94158, USA
| | - Jean Yee Hwa Yang
- San Francisco General Hospital General Clinical Research Center, University of California, San Francisco, San Francisco, California 94143, USA
- Department of Medicine, University of California, San Francisco, San Francisco California 94143, USA
| | - David J Erle
- San Francisco General Hospital General Clinical Research Center, University of California, San Francisco, San Francisco, California 94143, USA
- Department of Medicine, University of California, San Francisco, San Francisco California 94143, USA
| |
Collapse
|