1
|
Chu H, Liu T. Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models. Int J Mol Sci 2024; 25:4507. [PMID: 38674091 PMCID: PMC11049818 DOI: 10.3390/ijms25084507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.
Collapse
Affiliation(s)
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
| |
Collapse
|
2
|
Bajiya N, Choudhury S, Dhall A, Raghava GPS. AntiBP3: A Method for Predicting Antibacterial Peptides against Gram-Positive/Negative/Variable Bacteria. Antibiotics (Basel) 2024; 13:168. [PMID: 38391554 PMCID: PMC10885866 DOI: 10.3390/antibiotics13020168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/03/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
Most of the existing methods developed for predicting antibacterial peptides (ABPs) are mostly designed to target either gram-positive or gram-negative bacteria. In this study, we describe a method that allows us to predict ABPs against gram-positive, gram-negative, and gram-variable bacteria. Firstly, we developed an alignment-based approach using BLAST to identify ABPs and achieved poor sensitivity. Secondly, we employed a motif-based approach to predict ABPs and obtained high precision with low sensitivity. To address the issue of poor sensitivity, we developed alignment-free methods for predicting ABPs using machine/deep learning techniques. In the case of alignment-free methods, we utilized a wide range of peptide features that include different types of composition, binary profiles of terminal residues, and fastText word embedding. In this study, a five-fold cross-validation technique has been used to build machine/deep learning models on training datasets. These models were evaluated on an independent dataset with no common peptide between training and independent datasets. Our machine learning-based model developed using the amino acid binary profile of terminal residues achieved maximum AUC 0.93, 0.98, and 0.94 for gram-positive, gram-negative, and gram-variable bacteria, respectively, on an independent dataset. Our method performs better than existing methods when compared with existing approaches on an independent dataset. A user-friendly web server, standalone package and pip package have been developed to facilitate peptide-based therapeutics.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi 110020, India
| |
Collapse
|
3
|
Tellechea-Luzardo J, Stiebritz MT, Carbonell P. Transcription factor-based biosensors for screening and dynamic regulation. Front Bioeng Biotechnol 2023; 11:1118702. [PMID: 36814719 PMCID: PMC9939652 DOI: 10.3389/fbioe.2023.1118702] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 01/26/2023] [Indexed: 02/09/2023] Open
Abstract
Advances in synthetic biology and genetic engineering are bringing into the spotlight a wide range of bio-based applications that demand better sensing and control of biological behaviours. Transcription factor (TF)-based biosensors are promising tools that can be used to detect several types of chemical compounds and elicit a response according to the desired application. However, the wider use of this type of device is still hindered by several challenges, which can be addressed by increasing the current metabolite-activated transcription factor knowledge base, developing better methods to identify new transcription factors, and improving the overall workflow for the design of novel biosensor circuits. These improvements are particularly important in the bioproduction field, where researchers need better biosensor-based approaches for screening production-strains and precise dynamic regulation strategies. In this work, we summarize what is currently known about transcription factor-based biosensors, discuss recent experimental and computational approaches targeted at their modification and improvement, and suggest possible future research directions based on two applications: bioproduction screening and dynamic regulation of genetic circuits.
Collapse
Affiliation(s)
- Jonathan Tellechea-Luzardo
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Martin T. Stiebritz
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain,Institute for Integrative Systems Biology I2SysBio, Universitat de València-CSIC, Paterna, Spain,*Correspondence: Pablo Carbonell,
| |
Collapse
|
4
|
Hao T, Wissel B, Ni Y, Pajor N, Glauser T, Pestian J, Dexheimer JW. Implementation of Machine Learning Pipelines for Clinical Practice: Development and Validation Study. JMIR Med Inform 2022; 10:e37833. [PMID: 36525289 PMCID: PMC9804095 DOI: 10.2196/37833] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 09/01/2022] [Accepted: 09/19/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) technologies, such as machine learning and natural language processing, have the potential to provide new insights into complex health data. Although powerful, these algorithms rarely move from experimental studies to direct clinical care implementation. OBJECTIVE We aimed to describe the key components for successful development and integration of two AI technology-based research pipelines for clinical practice. METHODS We summarized the approach, results, and key learnings from the implementation of the following two systems implemented at a large, tertiary care children's hospital: (1) epilepsy surgical candidate identification (or epilepsy ID) in an ambulatory neurology clinic; and (2) an automated clinical trial eligibility screener (ACTES) for the real-time identification of patients for research studies in a pediatric emergency department. RESULTS The epilepsy ID system performed as well as board-certified neurologists in identifying surgical candidates (with a sensitivity of 71% and positive predictive value of 77%). The ACTES system decreased coordinator screening time by 12.9%. The success of each project was largely dependent upon the collaboration between machine learning experts, research and operational information technology professionals, longitudinal support from clinical providers, and institutional leadership. CONCLUSIONS These projects showcase novel interactions between machine learning recommendations and providers during clinical care. Our deployment provides seamless, real-time integration of AI technology to provide decision support and improve patient care.
Collapse
Affiliation(s)
| | - Benjamin Wissel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Nathan Pajor
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Tracy Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.,Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - John Pestian
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Judith W Dexheimer
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.,Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| |
Collapse
|
5
|
Liu CM, Ta VD, Le NQK, Tadesse DA, Shi C. Deep Neural Network Framework Based on Word Embedding for Protein Glutarylation Sites Prediction. Life (Basel) 2022; 12:life12081213. [PMID: 36013392 PMCID: PMC9410500 DOI: 10.3390/life12081213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 04/08/2023] Open
Abstract
In recent years, much research has found that dysregulation of glutarylation is associated with many human diseases, such as diabetes, cancer, and glutaric aciduria type I. Therefore, glutarylation identification and characterization are essential tasks for determining modification-specific proteomics. This study aims to propose a novel deep neural network framework based on word embedding techniques for glutarylation sites prediction. Multiple deep neural network models are implemented to evaluate the performance of glutarylation sites prediction. Furthermore, an extensive experimental comparison of word embedding techniques is conducted to utilize the most efficient method for improving protein sequence data representation. The results suggest that the proposed deep neural networks not only improve protein sequence representation but also work effectively in glutarylation sites prediction by obtaining a higher accuracy and confidence rate compared to the previous work. Moreover, embedding techniques were proven to be more productive than the pre-trained word embedding techniques for glutarylation sequence representation. Our proposed method has significantly outperformed all traditional performance metrics compared to the advanced integrated vector support, with accuracy, specificity, sensitivity, and correlation coefficient of 0.79, 0.89, 0.59, and 0.51, respectively. It shows the potential to detect new glutarylation sites and uncover the relationships between glutarylation and well-known lysine modification.
Collapse
Affiliation(s)
- Chuan-Ming Liu
- Department of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei City 106, Taiwan
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| | - Van-Dai Ta
- Samsung Display Vietnam (SDV), Yen Phong Industrial Park, Bac Ninh 16000, Vietnam
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City 106, Taiwan
| | | | - Chongyang Shi
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 102488, China
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| |
Collapse
|
6
|
Using Machine Learning for Pharmacovigilance: A Systematic Review. Pharmaceutics 2022; 14:pharmaceutics14020266. [PMID: 35213998 PMCID: PMC8924891 DOI: 10.3390/pharmaceutics14020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/13/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open
Abstract
Pharmacovigilance is a science that involves the ongoing monitoring of adverse drug reactions to existing medicines. Traditional approaches in this field can be expensive and time-consuming. The application of natural language processing (NLP) to analyze user-generated content is hypothesized as an effective supplemental source of evidence. In this systematic review, a broad and multi-disciplinary literature search was conducted involving four databases. A total of 5318 publications were initially found. Studies were considered relevant if they reported on the application of NLP to understand user-generated text for pharmacovigilance. A total of 16 relevant publications were included in this systematic review. All studies were evaluated to have medium reliability and validity. For all types of drugs, 14 publications reported positive findings with respect to the identification of adverse drug reactions, providing consistent evidence that natural language processing can be used effectively and accurately on user-generated textual content that was published to the Internet to identify adverse drug reactions for the purpose of pharmacovigilance. The evidence presented in this review suggest that the analysis of textual data has the potential to complement the traditional system of pharmacovigilance.
Collapse
|
7
|
A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:3498123. [PMID: 35013691 PMCID: PMC8742153 DOI: 10.1155/2022/3498123] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/25/2021] [Accepted: 12/03/2021] [Indexed: 01/10/2023]
Abstract
Text Classification problem has been thoroughly studied in information retrieval problems and data mining tasks. It is beneficial in multiple tasks including medical diagnose health and care department, targeted marketing, entertainment industry, and group filtering processes. A recent innovation in both data mining and natural language processing gained the attention of researchers from all over the world to develop automated systems for text classification. NLP allows categorizing documents containing different texts. A huge amount of data is generated on social media sites through social media users. Three datasets have been used for experimental purposes including the COVID-19 fake news dataset, COVID-19 English tweet dataset, and extremist-non-extremist dataset which contain news blogs, posts, and tweets related to coronavirus and hate speech. Transfer learning approaches do not experiment on COVID-19 fake news and extremist-non-extremist datasets. Therefore, the proposed work applied transfer learning classification models on both these datasets to check the performance of transfer learning models. Models are trained and evaluated on the accuracy, precision, recall, and F1-score. Heat maps are also generated for every model. In the end, future directions are proposed.
Collapse
|
8
|
Bangyal WH, Qasim R, Rehman NU, Ahmad Z, Dar H, Rukhsar L, Aman Z, Ahmad J. Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5514220. [PMID: 34819990 PMCID: PMC8608495 DOI: 10.1155/2021/5514220] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 10/15/2021] [Indexed: 01/10/2023]
Abstract
A vast amount of data is generated every second for microblogs, content sharing via social media sites, and social networking. Twitter is an essential popular microblog where people voice their opinions about daily issues. Recently, analyzing these opinions is the primary concern of Sentiment analysis or opinion mining. Efficiently capturing, gathering, and analyzing sentiments have been challenging for researchers. To deal with these challenges, in this research work, we propose a highly accurate approach for SA of fake news on COVID-19. The fake news dataset contains fake news on COVID-19; we started by data preprocessing (replace the missing value, noise removal, tokenization, and stemming). We applied a semantic model with term frequency and inverse document frequency weighting for data representation. In the measuring and evaluation step, we applied eight machine-learning algorithms such as Naive Bayesian, Adaboost, K-nearest neighbors, random forest, logistic regression, decision tree, neural networks, and support vector machine and four deep learning CNN, LSTM, RNN, and GRU. Afterward, based on the results, we boiled a highly efficient prediction model with python, and we trained and evaluated the classification model according to the performance measures (confusion matrix, classification rate, true positives rate...), then tested the model on a set of unclassified fake news on COVID-19, to predict the sentiment class of each fake news on COVID-19. Obtained results demonstrate a high accuracy compared to the other models. Finally, a set of recommendations is provided with future directions for this research to help researchers select an efficient sentiment analysis model on Twitter data.
Collapse
Affiliation(s)
| | - Rukhma Qasim
- Department of Computer Science, University of Gujrat, Pakistan
| | | | - Zeeshan Ahmad
- Department of Computer Science, University of Gujrat, Pakistan
| | - Hafsa Dar
- Department of Software Engineering, University of Gujrat, Pakistan
| | - Laiqa Rukhsar
- Department of Computer Science, University of Gujrat, Pakistan
| | - Zahra Aman
- Department of Computer Science, University of Gujrat, Pakistan
| | - Jamil Ahmad
- Professor Computer Science, Hazara University, Manshera, KPK, Pakistan
| |
Collapse
|
9
|
Queirós P, Delogu F, Hickl O, May P, Wilmes P. Mantis: flexible and consensus-driven genome annotation. Gigascience 2021; 10:6291114. [PMID: 34076241 PMCID: PMC8170692 DOI: 10.1093/gigascience/giab042] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 03/22/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
Background The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. Results We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. Conclusions Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
Collapse
Affiliation(s)
- Pedro Queirós
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Francesco Delogu
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Oskar Hickl
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Paul Wilmes
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
10
|
Queirós P, Novikova P, Wilmes P, May P. Unification of functional annotation descriptions using text mining. Biol Chem 2021; 402:983-990. [PMID: 33984880 DOI: 10.1515/hsz-2021-0125] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 05/03/2021] [Indexed: 02/06/2023]
Abstract
A common approach to genome annotation involves the use of homology-based tools for the prediction of the functional role of proteins. The quality of functional annotations is dependent on the reference data used, as such, choosing the appropriate sources is crucial. Unfortunately, no single reference data source can be universally considered the gold standard, thus using multiple references could potentially increase annotation quality and coverage. However, this comes with challenges, particularly due to the introduction of redundant and exclusive annotations. Through text mining it is possible to identify highly similar functional descriptions, thus strengthening the confidence of the final protein functional annotation and providing a redundancy-free output. Here we present UniFunc, a text mining approach that is able to detect similar functional descriptions with high precision. UniFunc was built as a small module and can be independently used or integrated into protein function annotation pipelines. By removing the need to individually analyse and compare annotation results, UniFunc streamlines the complementary use of multiple reference datasets.
Collapse
Affiliation(s)
| | | | - Paul Wilmes
- Systems Ecology, Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4362, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
11
|
Nguyen TTD, Le NQK, Ho QT, Phan DV, Ou YY. TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings. BMC Med Genomics 2020; 13:155. [PMID: 33087125 PMCID: PMC7579990 DOI: 10.1186/s12920-020-00779-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Background Cytokines are a class of small proteins that act as chemical messengers and play a significant role in essential cellular processes including immunity regulation, hematopoiesis, and inflammation. As one important family of cytokines, tumor necrosis factors have association with the regulation of a various biological processes such as proliferation and differentiation of cells, apoptosis, lipid metabolism, and coagulation. The implication of these cytokines can also be seen in various diseases such as insulin resistance, autoimmune diseases, and cancer. Considering the interdependence between this kind of cytokine and others, classifying tumor necrosis factors from other cytokines is a challenge for biological scientists. Methods In this research, we employed a word embedding technique to create hybrid features which was proved to efficiently identify tumor necrosis factors given cytokine sequences. We segmented each protein sequence into protein words and created corresponding word embedding for each word. Then, word embedding-based vector for each sequence was created and input into machine learning classification models. When extracting feature sets, we not only diversified segmentation sizes of protein sequence but also conducted different combinations among split grams to find the best features which generated the optimal prediction. Furthermore, our methodology follows a well-defined procedure to build a reliable classification tool. Results With our proposed hybrid features, prediction models obtain more promising performance compared to seven prominent sequenced-based feature kinds. Results from 10 independent runs on the surveyed dataset show that on an average, our optimal models obtain an area under the curve of 0.984 and 0.998 on 5-fold cross-validation and independent test, respectively. Conclusions These results show that biologists can use our model to identify tumor necrosis factors from other cytokines efficiently. Moreover, this study proves that natural language processing techniques can be applied reasonably to help biologists solve bioinformatics problems efficiently.
Collapse
Affiliation(s)
| | - Nguyen-Quoc-Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City, 106, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei City, 106, Taiwan
| | - Quang-Thai Ho
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 32003, Taiwan
| | - Dinh-Van Phan
- University of Economics, The University of Danang, Danang, 550000, Vietnam
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 32003, Taiwan.
| |
Collapse
|
12
|
DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2020; 2020:5904315. [PMID: 32308806 PMCID: PMC7142358 DOI: 10.1155/2020/5904315] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/21/2020] [Indexed: 12/27/2022]
Abstract
Normal cellular physiology and biochemical processes require undamaged RNA molecules. However, RNAs are frequently subjected to oxidative damage. Overproduction of reactive oxygen species (ROS) leads to RNA oxidation and disturbs redox (oxidation-reduction reaction) homeostasis. When oxidation damage affects RNA carrying protein-coding information, this may result in the synthesis of aberrant proteins as well as a lower efficiency of translation. Both of these, as well as imbalanced redox homeostasis, may lead to numerous human diseases. The number of studies on the effects of RNA oxidative damage in mammals is increasing by year due to the understanding that this oxidation fundamentally leads to numerous human diseases. To enable researchers in this field to explore information relevant to RNA oxidation and effects on human diseases, we developed DES-ROD, an online knowledgebase that contains processed information from 298,603 relevant documents that consist of PubMed abstracts and PubMed Central full-text articles. The system utilizes concepts/terms from 38 curated thematic dictionaries mapped to the analyzed documents. Researchers can explore enriched concepts, as well as enriched pairs of putatively associated concepts. In this way, one can explore mutual relationships between any combinations of two concepts from used dictionaries. Dictionaries cover a wide range of biomedical topics, such as human genes and proteins, pathways, Gene Ontology categories, mutations, noncoding RNAs, enzymes, toxins, metabolites, and diseases. This makes insights into different facets of the effects of RNA oxidation and the control of this process possible. The usefulness of the DES-ROD system is demonstrated by case studies on some known information, as well as potentially novel information involving RNA oxidation and diseases. DES-ROD is the first knowledgebase based on text and data mining that focused on the exploration of RNA oxidation and human diseases.
Collapse
|
13
|
Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins 2019; 88:397-413. [PMID: 31603244 DOI: 10.1002/prot.25832] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/05/2019] [Accepted: 09/17/2019] [Indexed: 12/17/2022]
Abstract
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.
Collapse
Affiliation(s)
- Rosalin Bonetta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Gianluca Valentino
- Department of Communications and Computer Engineering, University of Malta, Msida, Malta
| |
Collapse
|
14
|
Abstract
Population-based cancer registries have improved dramatically over the last 2 decades. These central cancer registries provide a critical framework that can elevate the science of cancer research. There have also been important technical and scientific advances that help to unlock the potential of population-based cancer registries. These advances include improvements in probabilistic record linkage, refinements in natural language processing, the ability to perform genomic sequencing on formalin-fixed, paraffin-embedded (FFPE) tissue, and improvements in the ability to identify activity levels of many different signaling molecules in FFPE tissue. This article describes how central cancer registries can provide a population-based sample frame that will lead to studies with strong external validity, how central cancer registries can link with public and private health insurance claims to obtain complete treatment information, how central cancer registries can use informatics techniques to provide population-based rapid case ascertainment, how central cancer registries can serve as a population-based virtual tissue repository, and how population-based cancer registries are essential for guiding the implementation of evidence-based interventions and measuring changes in the cancer burden after the implementation of these interventions.
Collapse
Affiliation(s)
- Thomas C Tucker
- Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, Kentucky.,Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, Kentucky
| | - Eric B Durbin
- Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, Kentucky.,Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, Kentucky
| | - Jaclyn K McDowell
- Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, Kentucky.,Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, Kentucky
| | - Bin Huang
- Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, Kentucky.,Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, Kentucky
| |
Collapse
|
15
|
Tucker TC, Durbin EB, McDowell JK, Huang B. Unlocking the potential of population-based cancer registries. Cancer 2019; 125:3729-3737. [PMID: 31381143 PMCID: PMC6851856 DOI: 10.1002/cncr.32355] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 03/13/2019] [Accepted: 04/16/2019] [Indexed: 12/31/2022]
Abstract
Population-based cancer registries have improved dramatically over the last 2 decades. These central cancer registries provide a critical framework that can elevate the science of cancer research. There have also been important technical and scientific advances that help to unlock the potential of population-based cancer registries. These advances include improvements in probabilistic record linkage, refinements in natural language processing, the ability to perform genomic sequencing on formalin-fixed, paraffin-embedded (FFPE) tissue, and improvements in the ability to identify activity levels of many different signaling molecules in FFPE tissue. This article describes how central cancer registries can provide a population-based sample frame that will lead to studies with strong external validity, how central cancer registries can link with public and private health insurance claims to obtain complete treatment information, how central cancer registries can use informatics techniques to provide population-based rapid case ascertainment, how central cancer registries can serve as a population-based virtual tissue repository, and how population-based cancer registries are essential for guiding the implementation of evidence-based interventions and measuring changes in the cancer burden after the implementation of these interventions.
Collapse
Affiliation(s)
- Thomas C. Tucker
- Kentucky Cancer Registry, Markey Cancer CenterUniversity of KentuckyLexingtonKentucky
- Department of Epidemiology, College of Public HealthUniversity of KentuckyLexingtonKentucky
| | - Eric B. Durbin
- Kentucky Cancer Registry, Markey Cancer CenterUniversity of KentuckyLexingtonKentucky
- Division of Biomedical Informatics, Department of Internal Medicine, College of MedicineUniversity of KentuckyLexingtonKentucky
| | - Jaclyn K. McDowell
- Kentucky Cancer Registry, Markey Cancer CenterUniversity of KentuckyLexingtonKentucky
- Department of Epidemiology, College of Public HealthUniversity of KentuckyLexingtonKentucky
| | - Bin Huang
- Kentucky Cancer Registry, Markey Cancer CenterUniversity of KentuckyLexingtonKentucky
- Department of Biostatistics, College of Public HealthUniversity of KentuckyLexingtonKentucky
| |
Collapse
|
16
|
Literature-Based Enrichment Insights into Redox Control of Vascular Biology. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2019; 2019:1769437. [PMID: 31223421 PMCID: PMC6542245 DOI: 10.1155/2019/1769437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/11/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
In cellular physiology and signaling, reactive oxygen species (ROS) play one of the most critical roles. ROS overproduction leads to cellular oxidative stress. This may lead to an irrecoverable imbalance of redox (oxidation-reduction reaction) function that deregulates redox homeostasis, which itself could lead to several diseases including neurodegenerative disease, cardiovascular disease, and cancers. In this study, we focus on the redox effects related to vascular systems in mammals. To support research in this domain, we developed an online knowledge base, DES-RedoxVasc, which enables exploration of information contained in the biomedical scientific literature. The DES-RedoxVasc system analyzed 233399 documents consisting of PubMed abstracts and PubMed Central full-text articles related to different aspects of redox biology in vascular systems. It allows researchers to explore enriched concepts from 28 curated thematic dictionaries, as well as literature-derived potential associations of pairs of such enriched concepts, where associations themselves are statistically enriched. For example, the system allows exploration of associations of pathways, diseases, mutations, genes/proteins, miRNAs, long ncRNAs, toxins, drugs, biological processes, molecular functions, etc. that allow for insights about different aspects of redox effects and control of processes related to the vascular system. Moreover, we deliver case studies about some existing or possibly novel knowledge regarding redox of vascular biology demonstrating the usefulness of DES-RedoxVasc. DES-RedoxVasc is the first compiled knowledge base using text mining for the exploration of this topic.
Collapse
|
17
|
Rees EE, Ng V, Gachon P, Mawudeku A, McKenney D, Pedlar J, Yemshanov D, Parmely J, Knox J. Risk assessment strategies for early detection and prediction of infectious disease outbreaks associated with climate change. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2019; 45:119-126. [PMID: 31285702 PMCID: PMC6587687 DOI: 10.14745/ccdr.v45i05a02] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A new generation of surveillance strategies is being developed to help detect emerging infections and to identify the increased risks of infectious disease outbreaks that are expected to occur with climate change. These surveillance strategies include event-based surveillance (EBS) systems and risk modelling. The EBS systems use open-source internet data, such as media reports, official reports, and social media (such as Twitter) to detect evidence of an emerging threat, and can be used in conjunction with conventional surveillance systems to enhance early warning of public health threats. More recently, EBS systems include artificial intelligence applications such machine learning and natural language processing to increase the speed, capacity and accuracy of filtering, classifying and analysing health-related internet data. Risk modelling uses statistical and mathematical methods to assess the severity of disease emergence and spread given factors about the host (e.g. number of reported cases), pathogen (e.g. pathogenicity) and environment (e.g. climate suitability for reservoir populations). The types of data in these models are expanding to include health-related information from open-source internet data and information on mobility patterns of humans and goods. This information is helping to identify susceptible populations and predict the pathways from which infections might spread into new areas and new countries. As a powerful addition to traditional surveillance strategies that identify what has already happened, it is anticipated that EBS systems and risk modelling will increasingly be used to inform public health actions to prevent, detect and mitigate the climate change increases in infectious diseases.
Collapse
Affiliation(s)
- EE Rees
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, St. Hyacinthe, QC
| | - V Ng
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON
| | - P Gachon
- Centre pour l’Étude et la Simulation du Climat à l’Échelle Régionale (ESCER), Université du Québec à Montréal (UQAM), Montréal, QC
| | - A Mawudeku
- Office of Situational Awareness and Operations, Centre for Emergency Preparedness and Response, Public Health Agency of Canada, Ottawa, ON
| | - D McKenney
- Natural Resources Canada, Canadian Forest Service, Great Lakes Forestry Centre, Sault Ste. Marie, ON
| | - J Pedlar
- Natural Resources Canada, Canadian Forest Service, Great Lakes Forestry Centre, Sault Ste. Marie, ON
| | - D Yemshanov
- Natural Resources Canada, Canadian Forest Service, Great Lakes Forestry Centre, Sault Ste. Marie, ON
| | - J Parmely
- Canadian Wildlife Health Cooperative, University of Guelph, Guelph, ON
| | - J Knox
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, St. Hyacinthe, QC
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON
| |
Collapse
|
18
|
Online visibility of software-related web sites: The case of biomedical text mining tools. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.11.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
19
|
Islam SMA, Heil BJ, Kearney CM, Baker EJ. Protein classification using modified n-grams and skip-grams. Bioinformatics 2019; 34:1481-1487. [PMID: 29309523 DOI: 10.1093/bioinformatics/btx823] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 12/21/2017] [Indexed: 12/24/2022] Open
Abstract
Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG). Results A meta-comparison of cross-validation accuracy with twelve training datasets from nine different published studies demonstrates a consistent increase in accuracy of m-NGSG when compared to contemporary classification and feature generation models. We expect this model to accelerate the classification of proteins from primary sequence data and increase the accessibility of protein characteristic prediction to a broader range of scientists. Availability and implementation m-NGSG is freely available at Bitbucket: https://bitbucket.org/sm_islam/mngsg/src. A web server is available at watson.ecs.baylor.edu/ngsg. Contact erich_baker@baylor.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Erich J Baker
- Institute of Biomedical Studies.,Department of Computer Science
| |
Collapse
|
20
|
Zhang X, Ye ZH, Liang HW, Ren FH, Li P, Dang YW, Chen G. Down-regulation of miR-146a-5p and its potential targets in hepatocellular carcinoma validated by a TCGA- and GEO-based study. FEBS Open Bio 2017; 7:504-521. [PMID: 28396836 PMCID: PMC5377416 DOI: 10.1002/2211-5463.12198] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 01/06/2017] [Accepted: 01/20/2017] [Indexed: 12/14/2022] Open
Abstract
Our previous research has demonstrated that miR‐146a‐5p is down‐regulated in hepatocellular carcinoma (HCC) and might play a tumor‐suppressive role. In this study, we sought to validate the decreased expression with a larger cohort and to explore potential molecular mechanisms. GEO and TCGA databases were used to gather miR‐146a‐5p expression data in HCC, which included 762 HCC and 454 noncancerous liver tissues. A meta‐analysis of the GEO‐based microarrays, TCGA‐based RNA‐seq data, and additional qRT‐PCR data validated the down‐regulation of miR‐146a‐5p in HCC and no publication bias was observed. Integrated genes were generated by overlapping miR‐146a‐5p‐related genes from predicted and formerly reported HCC‐related genes using natural language processing. The overlaps were comprehensively analyzed to discover the potential gene signatures, regulatory pathways, and networks of miR‐146a‐5p in HCC. A total of 251 miR‐146a‐5p potential target genes were predicted by bioinformatics platforms and 104 genes were considered as both HCC‐ and miR‐146a‐5p‐related overlaps. RAC1 was the most connected hub gene for miR‐146a‐5p and four pathways with high enrichment (VEGF signaling pathway, adherens junction, toll‐like receptor signaling pathway, and neurotrophin signaling pathway) were denoted for the overlapped genes. The down‐regulation of miR‐146a‐5p in HCC has been validated with the most complete data possible. The potential gene signatures, regulatory pathways, and networks identified for miR‐146a‐5p in HCC could prove useful for molecular‐targeted diagnostics and therapeutics.
Collapse
Affiliation(s)
- Xin Zhang
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Zhi-Hua Ye
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Hai-Wei Liang
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Fang-Hui Ren
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Ping Li
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Yi-Wu Dang
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| | - Gang Chen
- Department of Pathology First Affiliated Hospital of Guangxi Medical University Nanning China
| |
Collapse
|
21
|
Lou Y, Tu SW, Nyulas C, Tudorache T, Chalmers RJG, Musen MA. Use of ontology structure and Bayesian models to aid the crowdsourcing of ICD-11 sanctioning rules. J Biomed Inform 2017; 68:20-34. [PMID: 28192233 DOI: 10.1016/j.jbi.2017.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 02/02/2017] [Accepted: 02/08/2017] [Indexed: 11/18/2022]
Abstract
The International Classification of Diseases (ICD) is the de facto standard international classification for mortality reporting and for many epidemiological, clinical, and financial use cases. The next version of ICD, ICD-11, will be submitted for approval by the World Health Assembly in 2018. Unlike previous versions of ICD, where coders mostly select single codes from pre-enumerated disease and disorder codes, ICD-11 coding will allow extensive use of multiple codes to give more detailed disease descriptions. For example, "severe malignant neoplasms of left breast" may be coded using the combination of a "stem code" (e.g., code for malignant neoplasms of breast) with a variety of "extension codes" (e.g., codes for laterality and severity). The use of multiple codes (a process called post-coordination), while avoiding the pitfall of having to pre-enumerate vast number of possible disease and qualifier combinations, risks the creation of meaningless expressions that combine stem codes with inappropriate qualifiers. To prevent that from happening, "sanctioning rules" that define legal combinations are necessary. In this work, we developed a crowdsourcing method for obtaining sanctioning rules for the post-coordination of concepts in ICD-11. Our method utilized the hierarchical structures in the domain to improve the accuracy of the sanctioning rules and to lower the crowdsourcing cost. We used Bayesian networks to model crowd workers' skills, the accuracy of their responses, and our confidence in the acquired sanctioning rules. We applied reinforcement learning to develop an agent that constantly adjusted the confidence cutoffs during the crowdsourcing process to maximize the overall quality of sanctioning rules under a fixed budget. Finally, we performed formative evaluations using a skin-disease branch of the draft ICD-11 and demonstrated that the crowd-sourced sanctioning rules replicated those defined by an expert dermatologist with high precision and recall. This work demonstrated that a crowdsourcing approach could offer a reasonably efficient method for generating a first draft of sanctioning rules that subject matter experts could verify and edit, thus relieving them of the tedium and cost of formulating the initial set of rules.
Collapse
Affiliation(s)
- Yun Lou
- Stanford University, Stanford, CA, USA
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Collapse
Affiliation(s)
- Patrick Ruch
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.
- BiTeM Group, HES-SO\HEG Genève, 7 route de Drize, CH-1227, Carouge, Switzerland.
| |
Collapse
|
23
|
Topaz M, Radhakrishnan K, Blackley S, Lei V, Lai K, Zhou L. Studying Associations Between Heart Failure Self-Management and Rehospitalizations Using Natural Language Processing. West J Nurs Res 2016; 39:147-165. [PMID: 27628125 DOI: 10.1177/0193945916668493] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
This study developed an innovative natural language processing algorithm to automatically identify heart failure (HF) patients with ineffective self-management status (in the domains of diet, physical activity, medication adherence, and adherence to clinician appointments) from narrative discharge summary notes. We also analyzed the association between self-management status and preventable 30-day hospital readmissions. Our natural language system achieved relatively high accuracy ( F-measure = 86.3%; precision = 95%; recall = 79.2%) on a testing sample of 300 notes annotated by two human reviewers. In a sample of 8,901 HF patients admitted to our healthcare system, 14.4% ( n = 1,282) had documentation of ineffective HF self-management. Adjusted regression analyses indicated that presence of any skill-related self-management deficit (odds ratio [OR] = 1.3, 95% confidence interval [CI] = [1.1, 1.6]) and non-specific ineffective self-management (OR = 1.5, 95% CI = [1.2, 2]) was significantly associated with readmissions. We have demonstrated the feasibility of identifying ineffective HF self-management from electronic discharge summaries with natural language processing.
Collapse
Affiliation(s)
- Maxim Topaz
- 1 Harvard Medical School, Boston, MA, USA.,2 Brigham Women's Health Hospital, Boston, MA, USA
| | | | | | - Victor Lei
- 2 Brigham Women's Health Hospital, Boston, MA, USA
| | | | - Li Zhou
- 1 Harvard Medical School, Boston, MA, USA.,2 Brigham Women's Health Hospital, Boston, MA, USA.,4 Partners Healthcare Inc, Boston, MA, USA
| |
Collapse
|
24
|
Cornet R, Chute CG. Health Concept and Knowledge Management: Twenty-five Years of Evolution. Yearb Med Inform 2016; Suppl 1:S32-41. [PMID: 27488404 DOI: 10.15265/iys-2016-s037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES The fields of health terminology, classification, ontology, and related information models have evolved dramatically over the past 25 years. Our objective was to review notable trends, described emerging or enabling technologies, and highlight major terminology systems during the interval. METHODS We review the progression in health terminology systems informed by our own experiences as part of the community involved in this work, reinforced with literature review and citation. RESULTS The transformation in size, scope, complexity, and adoption of health terminological systems and information models has been tremendous, on the scale of orders of magnitude. CONCLUSION The present "big science" era of inference and discovery in biomedicine would not have been possible or scalable absent the growth and maturation of health terminology systems and information models over the past 25 years.
Collapse
Affiliation(s)
- R Cornet
- Ronald Cornet, PhD, Visiting Associate Professor, Linköping University, Assistant Professor, Academisch Medisch Centrum, Medical Informatics, J1b-115, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands, E-Mail:
| | - C G Chute
- Christopher G Chute, MD DrPH, Bloomberg Distinguished Professor of Health Informatics, Professor of Medicine, Public Health, and Nursing, Chief Research Information Officer, Johns Hopkins Medicine, Johns Hopkins University, Division of General Internal Medicine, 2024 E Monument St, Suite 1-200, Baltimore, MD 21287, USA, E-Mail:
| |
Collapse
|
25
|
Fernández A, Scott LR. Drug leads for interactive protein targets with unknown structure. Drug Discov Today 2015; 21:531-5. [PMID: 26484433 DOI: 10.1016/j.drudis.2015.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 09/29/2015] [Accepted: 10/12/2015] [Indexed: 11/24/2022]
Abstract
The disruption of protein-protein interfaces (PPIs) remains a challenge in drug discovery. The problem becomes daunting when the structure of the target protein is unknown and is even further complicated when the interface is susceptible to disruptive phosphorylation. Based solely on protein sequence and information about phosphorylation-susceptible sites within the PPI, a new technology has been developed to identify drug leads to inhibit protein associations. Here we reveal this technology and contrast it with current structure-based technologies for the generation of drug leads. The novel technology is illustrated by a patented invention to treat heart failure. The success of this technology shows that it is possible to generate drug leads in the absence of target structure.
Collapse
Affiliation(s)
- Ariel Fernández
- Argentine Institute of Mathematics (IAM), National Research Council (CONICET), Buenos Aires 1083, Argentina; AF Innovation, Avenida del Libertador 1092, Buenos Aires 1112, Argentina.
| | - L Ridgway Scott
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA; Department of Mathematics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|