1
|
Song Y, Wang Y, Geng X, Wang X, He H, Qian Y, Dong Y, Fan Z, Chen S, Wen W, Wang H. Novel biomarker genes for the prediction of post-hepatectomy survival of patients with NAFLD-related hepatocellular carcinoma. Cancer Cell Int 2023; 23:269. [PMID: 37950277 PMCID: PMC10638756 DOI: 10.1186/s12935-023-03106-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 10/24/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND The incidence and prevalence of nonalcoholic fatty liver disease related hepatocellular carcinoma (NAFLD-HCC) are rapidly increasing worldwide. This study aimed to identify biomarker genes for prognostic prediction model of NAFLD-HCC hepatectomy by integrating text-mining, clinical follow-up information, transcriptomic data and experimental validation. METHODS The tumor and adjacent normal liver samples collected from 13 NAFLD-HCC and 12 HBV-HCC patients were sequenced using RNA-Seq. A novel text-mining strategy, explainable gene ontology fingerprint approach, was utilized to screen NAFLD-HCC featured gene sets and cell types, and the results were validated through a series of lab experiments. A risk score calculated by the multivariate Cox regression model using discovered key genes was established and evaluated based on 47 patients' follow-up information. RESULTS Differentially expressed genes associated with NAFLD-HCC specific tumor microenvironment were screened, of which FABP4 and VWF were featured by previous reports. A risk prediction model consisting of FABP4, VWF, gender and TNM stage were then established based on 47 samples. The model showed that overall survival in the high-risk score group was lower compared with that in the low-risk score group (p = 0.0095). CONCLUSIONS This study provided the landscape of NAFLD-HCC transcriptome, and elucidated that our model could predict hepatectomy prognosis with high accuracy.
Collapse
Affiliation(s)
- Yuting Song
- Model Animal Research Center, Nanjing University, Nanjing, 210008, China
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Ying Wang
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, 200438, China
| | - Xin Geng
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Xianming Wang
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Huisi He
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Youwen Qian
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Yaping Dong
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Zhecai Fan
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Shuzhen Chen
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China
| | - Wen Wen
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China.
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China.
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, 200438, China.
| | - Hongyang Wang
- Model Animal Research Center, Nanjing University, Nanjing, 210008, China.
- National Center for Liver Cancer, Naval Medical University, Shanghai, 201805, China.
- International Cooperation Laboratory on Signal Transduction, Third Affiliated Hospital of Naval Medical University, Shanghai, 200438, China.
| |
Collapse
|
2
|
Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y. Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19. J Med Internet Res 2023; 25:e48115. [PMID: 37632414 PMCID: PMC10551783 DOI: 10.2196/48115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 07/03/2023] [Accepted: 08/25/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.
Collapse
Affiliation(s)
- Zeyu Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Meng Fang
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| | - Rebecca Wu
- University of California, Berkeley, Berkeley, CA, United States
| | - Hui Zong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Honglian Huang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yuantao Tong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yujia Xie
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Shiyang Cheng
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ziyi Wei
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - M James C Crabbe
- Wolfson College, Oxford University, Oxford, United Kingdom
- Institute of Biomedical and Environmental Science & Technology, University of Bedfordshire, Luton, United Kingdom
- School of Life Sciences, Shanxi University, Taiyuan, China
| | - Xiaoyan Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Ying Wang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
- Department of Clinical Laboratory Medicine Center, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China
| |
Collapse
|
3
|
Morris VE, Hashmi SS, Zhu L, Maili L, Urbina C, Blackwell S, Greives MR, Buchanan EP, Mulliken JB, Blanton SH, Zheng WJ, Hecht JT, Letra A. Evidence for craniofacial enhancer variation underlying nonsyndromic cleft lip and palate. Hum Genet 2020; 139:1261-1272. [PMID: 32318854 DOI: 10.1007/s00439-020-02169-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 04/13/2020] [Indexed: 12/14/2022]
Abstract
Nonsyndromic cleft lip with or without cleft palate (NSCLP) is a common birth defect for which only ~ 20% of the underlying genetic variation has been identified. Variants in noncoding regions have been increasingly suggested to contribute to the missing heritability. In this study, we investigated whether variation in craniofacial enhancers contributes to NSCLP. Candidate enhancers were identified using VISTA Enhancer Browser and previous publications. Prioritization was based on patterning defects in knockout mice, deletion/duplication of craniofacial genes in animal models and results of whole exome/whole genome sequencing studies. This resulted in 20 craniofacial enhancers to be investigated. Custom amplicon-based sequencing probes were designed and used for sequencing 380 NSCLP probands (from multiplex and simplex families of non-Hispanic white (NHW) and Hispanic ethnicities) using Illumina MiSeq. The frequencies of identified variants were compared to ethnically matched European (CEU) and Los Angeles Mexican (MXL) control genomes and used for association analyses. Variants in mm427/MSX1 and hs1582/SPRY1 showed genome-wide significant association with NSCLP (p ≤ 6.4 × 10-11). In silico analysis showed that these enhancer variants may disrupt important transcription factor binding sites. Haplotypes involving these enhancers and also mm435/ABCA4 were significantly associated with NSCLP, especially in NHW (p ≤ 6.3 × 10-7). Importantly, groupwise burden analysis showed several enhancer combinations significantly over-represented in NSCLP individuals, revealing novel NSCLP pathways and supporting a polygenic inheritance model. Our findings support the role of craniofacial enhancer sequence variation in the etiology of NSCLP.
Collapse
Affiliation(s)
- Vershanna E Morris
- Department of Pediatrics, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Pediatric Research Center, UTHealth McGovern Medical School, Houston, TX, 77030, USA
| | - S Shahrukh Hashmi
- Department of Pediatrics, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Pediatric Research Center, UTHealth McGovern Medical School, Houston, TX, 77030, USA
| | - Lisha Zhu
- UTHealth School of Biomedical Informatics, Houston, TX, 77054, USA
| | - Lorena Maili
- Department of Pediatrics, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Pediatric Research Center, UTHealth McGovern Medical School, Houston, TX, 77030, USA
| | - Christian Urbina
- Department of Pediatrics, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Pediatric Research Center, UTHealth McGovern Medical School, Houston, TX, 77030, USA
| | | | - Matthew R Greives
- Department of Pediatric Surgery, University of Texas Health Science Center McGovern Medical School, Houston, TX, 77030, USA
| | - Edward P Buchanan
- Department of Plastic Surgery, Texas Children's Hospital, Houston, TX, 77030, USA
| | - John B Mulliken
- Department of Plastic Surgery, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Susan H Blanton
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - W Jim Zheng
- UTHealth School of Biomedical Informatics, Houston, TX, 77054, USA
| | - Jacqueline T Hecht
- Department of Pediatrics, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Pediatric Research Center, UTHealth McGovern Medical School, Houston, TX, 77030, USA.,Shriners' Hospital for Children, Houston, TX, 77030, USA.,Center for Craniofacial Research, UTHealth School of Dentistry, Houston, TX, 77054, USA
| | - Ariadne Letra
- School of Dentistry, Department of Diagnostic and Biomedical Sciences, University of Texas Health Science Center At Houston, 1941 East Road, BBSB 4210, Houston, TX, 77054, USA. .,Center for Craniofacial Research, UTHealth School of Dentistry, Houston, TX, 77054, USA.
| |
Collapse
|
4
|
Chen G, Jia Y, Zhu L, Li P, Zhang L, Tao C, Jim Zheng W. Gene fingerprint model for literature based detection of the associations among complex diseases: a case study of COPD. BMC Med Inform Decis Mak 2019; 19:20. [PMID: 30700303 PMCID: PMC6354331 DOI: 10.1186/s12911-019-0738-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Disease comorbidity is very common and has significant impact on disease treatment. Revealing the associations among diseases may help to understand the mechanisms of diseases, improve the prevention and treatment of diseases, and support the discovery of new drugs or new uses of existing drugs. METHODS In this paper, we introduced a mathematical model to represent gene related diseases with a series of associated genes based on the overrepresentation of genes and diseases in PubMed literature. We also illustrated an efficient way to reveal the implicit connections between COPD and other diseases based on this model. RESULTS We applied this approach to analyze the relationships between Chronic Obstructive Pulmonary Disease (COPD) and other diseases under the Lung diseases branch in the Medical subject heading index system and detected 4 novel diseases relevant to COPD. As judged by domain experts, the F score of our approach is up to 77.6%. CONCLUSIONS The results demonstrate the effectiveness of the gene fingerprint model for diseases on the basis of medical literature.
Collapse
Affiliation(s)
- Guocai Chen
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Yuxi Jia
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St Suite 600, Houston, TX 77030 USA
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, Jilin, 130021 China
| | - Lisha Zhu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - Ping Li
- Department of Development Pediatrics, The Second Affiliated Hospital of Jilin University, Changchun, Jilin, 130041 China
| | - Lin Zhang
- Department of Respiratory Medicine, The Second Affiliated Hospital of Jilin University, Changchun, Jilin, 130041 China
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| | - W. Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St Suite 600, Houston, TX 77030 USA
| |
Collapse
|
5
|
Chen G, Tsoi A, Xu H, Zheng WJ. Predict effective drug combination by deep belief network and ontology fingerprints. J Biomed Inform 2018; 85:149-154. [DOI: 10.1016/j.jbi.2018.07.024] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/25/2018] [Accepted: 07/30/2018] [Indexed: 11/17/2022]
|
6
|
A Multi-Parameter Analysis of Cellular Coordination of Major Transcriptome Regulation Mechanisms. Sci Rep 2018; 8:5742. [PMID: 29636505 PMCID: PMC5893539 DOI: 10.1038/s41598-018-24039-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 03/21/2018] [Indexed: 01/06/2023] Open
Abstract
To understand cellular coordination of multiple transcriptome regulation mechanisms, we simultaneously measured transcription rate (TR), mRNA abundance (RA) and translation activity (TA). This revealed multiple insights. First, the three parameters displayed systematic statistical differences. Sequentially more genes exhibited extreme (low or high) expression values from TR to RA, and then to TA; that is, cellular coordination of multiple transcriptome regulatory mechanisms leads to sequentially enhanced gene expression selectivity as the genetic information flow from the genome to the proteome. Second, contribution of the stabilization-by-translation regulatory mechanism to the cellular coordination process was assessed. The data enabled an estimation of mRNA stability, revealing a moderate but significant positive correlation between mRNA stability and translation activity. Third, the proportion of mRNA occupied by un-translated regions (UTR) exhibited a negative relationship with the level of this correlation, and was thus a major determinant of the mode of regulation of the mRNA. High-UTR-proportion mRNAs tend to defy the stabilization-by-translation regulatory mechanism, staying out of the polysome but remaining stable; mRNAs with little UTRs largely followed this regulation. In summary, we quantitatively delineated the relationship among multiple transcriptome regulation parameters, i.e., cellular coordination of corresponding regulatory mechanisms.
Collapse
|
7
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|
8
|
Tsoi LC, Patrick MT, Elder JT. Research Techniques Made Simple: Using Genome-Wide Association Studies to Understand Complex Cutaneous Disorders. J Invest Dermatol 2018; 138:e23-e29. [PMID: 29477192 PMCID: PMC5903459 DOI: 10.1016/j.jid.2018.01.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Complex cutaneous disorders result from the combined effect of many different genes and environmental factors, with individual genetic variants often having only a modest effect on disease risk. The ability to examine large numbers of samples is required for correlating genetic variants with diseases/traits. Technological advances in high-throughput genotyping, along with mapping of the human genome and its associated inter-individual variation, have allowed genetic variants to be analyzed at high density in large case-control cohorts for many diseases, including several major skin diseases. These genome-wide association studies focus on showing differences in the frequencies of variants between case and control groups, rather than co-transmission of a variant and disease through a family, as is done in linkage studies. In this review, we provide overall guidance for genome-wide association study analysis and interpreting the results. Additionally, we discuss challenges and future directions for genome-wide association studies, focusing on translation of findings to provide biological and clinical implications for dermatology.
Collapse
Affiliation(s)
- Lam C Tsoi
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA; Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA.
| | - Matthew T Patrick
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - James T Elder
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, Michigan, USA; Ann Arbor Veterans Affairs Hospital, Ann Arbor, Michigan, USA.
| |
Collapse
|
9
|
Chen G, Zhao J, Cohen T, Tao C, Sun J, Xu H, Bernstam EV, Lawson A, Zeng J, Johnson AM, Holla V, Bailey AM, Lara-Guerra H, Litzenburger B, Meric-Bernstam F, Jim Zheng W. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav034. [PMID: 25858285 PMCID: PMC4390608 DOI: 10.1093/database/bav034] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Accepted: 03/17/2015] [Indexed: 11/14/2022]
Abstract
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org
Collapse
Affiliation(s)
- Guocai Chen
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Jieyi Zhao
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Trevor Cohen
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Cui Tao
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Jingchun Sun
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Hua Xu
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Elmer V Bernstam
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Andrew Lawson
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Jia Zeng
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Amber M Johnson
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Vijaykumar Holla
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Ann M Bailey
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Humberto Lara-Guerra
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Beate Litzenburger
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - Funda Meric-Bernstam
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| | - W Jim Zheng
- Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
| |
Collapse
|
10
|
Qin T, Matmati N, Tsoi LC, Mohanty BK, Gao N, Tang J, Lawson AB, Hannun YA, Zheng WJ. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Res 2014; 42:e138. [PMID: 25063300 PMCID: PMC4191379 DOI: 10.1093/nar/gku678] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general.
Collapse
Affiliation(s)
- Tingting Qin
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nabil Matmati
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - Lam C Tsoi
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Bidyut K Mohanty
- Department of Biochemistry & Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Nan Gao
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
| | - Andrew B Lawson
- Department of Public Health Science, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Yusuf A Hannun
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Chen YA, Eschrich SA. Computational methods and opportunities for phosphorylation network medicine. Transl Cancer Res 2014; 3:266-278. [PMID: 25530950 PMCID: PMC4271781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Protein phosphorylation, one of the most ubiquitous post-translational modifications (PTM) of proteins, is known to play an essential role in cell signaling and regulation. With the increasing understanding of the complexity and redundancy of cell signaling, there is a growing recognition that targeting the entire network or system could be a necessary and advantageous strategy for treating cancer. Protein kinases, the proteins that add a phosphate group to the substrate proteins during phosphorylation events, have become one of the largest groups of 'druggable' targets in cancer therapeutics in recent years. Kinase inhibitors are being regularly used in clinics for cancer treatment. This therapeutic paradigm shift in cancer research is partly due to the generation and availability of high-dimensional proteomics data. Generation of this data, in turn, is enabled by increased use of mass-spectrometry (MS)-based or other high-throughput proteomics platforms as well as companion public databases and computational tools. This review briefly summarizes the current state and progress on phosphoproteomics identification, quantification, and platform related characteristics. We review existing database resources, computational tools, methods for phosphorylation network inference, and ultimately demonstrate the connection to therapeutics. Finally, many research opportunities exist for bioinformaticians or biostatisticians based on developments and limitations of the current and emerging technologies.
Collapse
Affiliation(s)
- Yian Ann Chen
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| |
Collapse
|
12
|
Xiang Z, Qin T, Qin ZS, He Y. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 3:S9. [PMID: 24555475 PMCID: PMC3852244 DOI: 10.1186/1752-0509-7-s3-s9] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Background The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. Results The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. Conclusions The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining system that effectively predicts implicit gene-gene interaction relationships and networks in a genome-wide scope.
Collapse
|
13
|
Qin T, Tsoi LC, Sims KJ, Lu X, Zheng WJ. Signaling network prediction by the Ontology Fingerprint enhanced Bayesian network. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S3. [PMID: 23282239 PMCID: PMC3524013 DOI: 10.1186/1752-0509-6-s3-s3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND Despite large amounts of available genomic and proteomic data, predicting the structure and response of signaling networks is still a significant challenge. While statistical method such as Bayesian network has been explored to meet this challenge, employing existing biological knowledge for network prediction is difficult. The objective of this study is to develop a novel approach that integrates prior biological knowledge in the form of the Ontology Fingerprint to infer cell-type-specific signaling networks via data-driven Bayesian network learning; and to further use the trained model to predict cellular responses. RESULTS We applied our novel approach to address the Predictive Signaling Network Modeling challenge of the fourth (2009) Dialog for Reverse Engineering Assessment's and Methods (DREAM4) competition. The challenge results showed that our method accurately captured signal transduction of a network of protein kinases and phosphoproteins in that the predicted protein phosphorylation levels under all experimental conditions were highly correlated (R2 = 0.93) with the observed results. Based on the evaluation of the DREAM4 organizer, our team was ranked as one of the top five best performers in predicting network structure and protein phosphorylation activity under test conditions. CONCLUSIONS Bayesian network can be used to simulate the propagation of signals in cellular systems. Incorporating the Ontology Fingerprint as prior biological knowledge allows us to efficiently infer concise signaling network structure and to accurately predict cellular responses.
Collapse
Affiliation(s)
- Tingting Qin
- Bioinformatics Graduate Program, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Lam C Tsoi
- Bioinformatics Graduate Program, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Kellie J Sims
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USA
| | - W Jim Zheng
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| |
Collapse
|
14
|
Tudor CO, Schmidt CJ, Vijay-Shanker K. eGIFT: mining gene information from the literature. BMC Bioinformatics 2010; 11:418. [PMID: 20696046 PMCID: PMC2929241 DOI: 10.1186/1471-2105-11-418] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2009] [Accepted: 08/09/2010] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. RESULTS In this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms. CONCLUSIONS Our evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.
Collapse
Affiliation(s)
- Catalina O Tudor
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, USA.
| | | | | |
Collapse
|
15
|
Abstract
Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods. Contact:jason.h.moore@dartmouth.edu
Collapse
Affiliation(s)
- Jason H Moore
- Department of Genetics, Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH 03756, USA.
| | | | | |
Collapse
|