Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023;24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open

Mokhtaridoost M, Maass PG, Gönen M. Identifying Tissue- and Cohort-Specific RNA Regulatory Modules in Cancer Cells Using Multitask Learning. Cancers (Basel) 2022;14:cancers14194939. [PMID: 36230862 PMCID: PMC9563725 DOI: 10.3390/cancers14194939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/24/2022] Open

Abstract

Simple Summary

Understanding the underlying biological mechanisms of primary tumors is crucial for predicting how tumors respond to therapies and exploring accurate treatment strategies. miRNA–mRNA interactions have a major effect on many biological processes that are important in the formation and progression of cancer. In this study, we introduced a computational pipeline to extract tissue- and cohort-specific miRNA–mRNA regulatory modules of multiple cancer types from the same origin using miRNA and mRNA expression profiles of primary tumors. Our model identified regulatory modules of underlying cancer types (i.e., cohort-specific) and shared regulatory modules between cohorts (i.e., tissue-specific).

Abstract

MicroRNA (miRNA) alterations significantly impact the formation and progression of human cancers. miRNAs interact with messenger RNAs (mRNAs) to facilitate degradation or translational repression. Thus, identifying miRNA–mRNA regulatory modules in cohorts of primary tumor tissues are fundamental for understanding the biology of tumor heterogeneity and precise diagnosis and treatment. We established a multitask learning sparse regularized factor regression (MSRFR) method to determine key tissue- and cohort-specific miRNA–mRNA regulatory modules from expression profiles of tumors. MSRFR simultaneously models the sparse relationship between miRNAs and mRNAs and extracts tissue- and cohort-specific miRNA–mRNA regulatory modules separately. We tested the model’s ability to determine cohort-specific regulatory modules of multiple cancer cohorts from the same tissue and their underlying tissue-specific regulatory modules by extracting similarities between cancer cohorts (i.e., blood, kidney, and lung). We also detected tissue-specific and cohort-specific signatures in the corresponding regulatory modules by comparing our findings from various other tissues. We show that MSRFR effectively determines cancer-related miRNAs in cohort-specific regulatory modules, distinguishes tissue- and cohort-specific regulatory modules from each other, and extracts tissue-specific information from different cohorts of disease-related tissue. Our findings indicate that the MSRFR model can support current efforts in precision medicine to define tumor-specific miRNA–mRNA signatures.

Collapse

Yang K, Lu J, Wan W, Zhang G, Hou L. Transfer learning based on sparse Gaussian process for regression. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.05.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Peng X, Wang X, Guo Y, Ge Z, Li F, Gao X, Song J. RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Brief Bioinform 2022;23:6596984. [PMID: 35649392 PMCID: PMC9294422 DOI: 10.1093/bib/bbac215] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/25/2022] [Accepted: 05/06/2022] [Indexed: 11/27/2022] Open

Liao X, Ma H, Tang YJ. Artificial intelligence: a solution to involution of design–build–test–learn cycle. Curr Opin Biotechnol 2022;75:102712. [DOI: 10.1016/j.copbio.2022.102712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 02/05/2022] [Accepted: 03/01/2022] [Indexed: 01/08/2023]

A Machine Learning Strategy Based on Kittler’s Taxonomy to Detect Anomalies and Recognize Contexts Applied to Monitor Water Bodies in Environments. REMOTE SENSING 2022. [DOI: 10.3390/rs14092222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Abstract Environmental monitoring, such as analyses of water bodies to detect anomalies, is recognized worldwide as a task necessary to reduce the impacts arising from pollution. However, the large number of data available to be analyzed in different contexts, such as in an image time series acquired by satellites, still pose challenges for the detection of anomalies, even when using computers. This study describes a machine learning strategy based on Kittler’s taxonomy to detect anomalies related to water pollution in an image time series. We propose this strategy to monitor environments, detecting unexpected conditions that may occur (i.e., detecting outliers), and identifying those outliers in accordance with Kittler’s taxonomy (i.e., detecting anomalies). According to our strategy, contextual and non-contextual image classifications were semi-automatically compared to find any divergence that indicates the presence of one type of anomaly defined by the taxonomy. In our strategy, models built to classify a single image were used to classify an image time series due to domain adaptation. The results 99.07%, 99.99%, 99.07%, and 99.53% were achieved by our strategy, respectively, for accuracy, precision, recall, and F-measure. These results suggest that our strategy allows computers to recognize contexts and enhances their capabilities to solve contextualized problems. Therefore, our strategy can be used to guide computational systems to make different decisions to solve a problem in response to each context. The proposed strategy is relevant for improving machine learning, as its use allows computers to have a more organized learning process. Our strategy is presented with respect to its applicability to help monitor environmental disasters. A minor limitation was found in the results caused by the use of domain adaptation. This type of limitation is fairly common when using domain adaptation, and therefore has no significance. Even so, future work should investigate other techniques for transfer learning. Collapse

Liang Z, Dong H, Liu C, Liang W, Zhu Z. Evolutionary Multitasking for Multiobjective Optimization With Subspace Alignment and Adaptive Differential Evolution. IEEE TRANSACTIONS ON CYBERNETICS 2022;52:2096-2109. [PMID: 32579534 DOI: 10.1109/tcyb.2020.2980888] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Chen J, Cheong HH, Siu SWI. xDeep-AcPEP: Deep Learning Method for Anticancer Peptide Activity Prediction Based on Convolutional Neural Network and Multitask Learning. J Chem Inf Model 2021;61:3789-3803. [PMID: 34327990 DOI: 10.1021/acs.jcim.1c00181] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Yang K, Lu J, Wan W, Zhang G. Multi-source transfer regression via source-target pairwise segment. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Kouw WM, Loog M. A Review of Domain Adaptation without Target Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021;43:766-785. [PMID: 31603771 DOI: 10.1109/tpami.2019.2945942] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Rahimi A, Gönen M. A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers. Bioinformatics 2020;36:3766-3772. [DOI: 10.1093/bioinformatics/btaa168] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 03/03/2020] [Accepted: 03/06/2020] [Indexed: 12/13/2022] Open

Abstract Abstract Motivation Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Multi-modal neuroimaging feature selection with consistent metric constraint for diagnosis of Alzheimer's disease. Med Image Anal 2020. [DOI: 10.1016/j.media.2019.101625 10.1016/j.media.2019.101625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Hao X, Bao Y, Guo Y, Yu M, Zhang D, Risacher SL, Saykin AJ, Yao X, Shen L. Multi-modal neuroimaging feature selection with consistent metric constraint for diagnosis of Alzheimer's disease. Med Image Anal 2020;60:101625. [PMID: 31841947 PMCID: PMC6980345 DOI: 10.1016/j.media.2019.101625] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 11/25/2019] [Accepted: 11/25/2019] [Indexed: 12/12/2022]

Abstract

The accurate diagnosis of Alzheimer's disease (AD) and its early stage, e.g., mild cognitive impairment (MCI), is essential for timely treatment or possible intervention to slow down AD progression. Recent studies have demonstrated that multiple neuroimaging and biological measures contain complementary information for diagnosis and prognosis. Therefore, information fusion strategies with multi-modal neuroimaging data, such as voxel-based measures extracted from structural MRI (VBM-MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET), have shown their effectiveness for AD diagnosis. However, most existing methods are proposed to simply integrate the multi-modal data, but do not make full use of structure information across the different modalities. In this paper, we propose a novel multi-modal neuroimaging feature selection method with consistent metric constraint (MFCC) for AD analysis. First, the similarity is calculated for each modality (i.e. VBM-MRI or FDG-PET) individually by random forest strategy, which can extract pairwise similarity measures for multiple modalities. Then the group sparsity regularization term and the sample similarity constraint regularization term are used to constrain the objective function to conduct feature selection from multiple modalities. Finally, the multi-kernel support vector machine (MK-SVM) is used to fuse the features selected from different models for final classification. The experimental results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) show that the proposed method has better classification performance than the start-of-the-art multimodality-based methods. Specifically, we achieved higher accuracy and area under the curve (AUC) for AD versus normal controls (NC), MCI versus NC, and MCI converters (MCI-C) versus MCI non-converters (MCI-NC) on ADNI datasets. Therefore, the proposed model not only outperforms the traditional method in terms of AD/MCI classification, but also discovers the characteristics associated with the disease, demonstrating its promise for improving disease-related mechanistic understanding.

Collapse

An Incongruence-Based Anomaly Detection Strategy for Analyzing Water Pollution in Images from Remote Sensing. REMOTE SENSING 2019. [DOI: 10.3390/rs12010043] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019;20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]

Singh R, Lanchantin J, Robins G, Qi Y. Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1524-1536. [PMID: 27654939 DOI: 10.1109/tcbb.2016.2609918] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Halder AK, Dutta P, Kundu M, Basu S, Nasipuri M. Review of computational methods for virus-host protein interaction prediction: a case study on novel Ebola-human interactions. Brief Funct Genomics 2019;17:381-391. [PMID: 29028879 PMCID: PMC7109800 DOI: 10.1093/bfgp/elx026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

TopP-S: Persistent homology-based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility. J Comput Chem 2018;39:1444-1454. [DOI: 10.1002/jcc.25213] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 01/15/2018] [Accepted: 02/25/2018] [Indexed: 01/09/2023]

Kong Y, Shao M, Li K, Fu Y. Probabilistic Low-Rank Multitask Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018;29:670-680. [PMID: 28060715 DOI: 10.1109/tnnls.2016.2641160] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Wu K, Wei GW. Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks. J Chem Inf Model 2018;58:520-531. [DOI: 10.1021/acs.jcim.7b00558] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Lee HJ, Wu Y, Zhang Y, Xu J, Xu H, Roberts K. A hybrid approach to automatic de-identification of psychiatric notes. J Biomed Inform 2017;75S:S19-S27. [PMID: 28602904 PMCID: PMC5705430 DOI: 10.1016/j.jbi.2017.06.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 06/02/2017] [Accepted: 06/05/2017] [Indexed: 11/17/2022]

Chen J, Jagannatha AN, Fodeh SJ, Yu H. Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach. JMIR Med Inform 2017;5:e42. [PMID: 29089288 PMCID: PMC5686421 DOI: 10.2196/medinform.8531] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Revised: 09/19/2017] [Accepted: 09/20/2017] [Indexed: 11/13/2022] Open

Abstract

Background

Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.

Objective

We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation—that is, creating lay definitions for these terms.

Methods

Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.

Results

The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P<.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS’s performance substantially.

Conclusions

ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS’s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.

Collapse

Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017;18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance.

METHOD

We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions.

RESULTS

VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time.

CONCLUSION

In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.

Collapse

Wang Y, Song J, Marquez-Lago TT, Leier A, Li C, Lithgow T, Webb GI, Shen HB. Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites. Sci Rep 2017;7:5755. [PMID: 28720874 PMCID: PMC5515926 DOI: 10.1038/s41598-017-06219-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 06/08/2017] [Indexed: 11/24/2022] Open

Affiliation(s)

Yanan Wang Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, 3800, Australia
Jiangning Song Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Melbourne, VIC, 3800, Australia
Tatiana T Marquez-Lago Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
André Leier Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
Chen Li Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
Trevor Lithgow Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, 3800, Australia.
Geoffrey I Webb Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia.
Hong-Bin Shen Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.

Collapse

Zhang Y, Tang B, Jiang M, Wang J, Xu H. Domain adaptation for semantic role labeling of clinical text. J Am Med Inform Assoc 2015;22:967-79. [PMID: 26063745 DOI: 10.1093/jamia/ocu048] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 12/15/2014] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVE

Semantic role labeling (SRL), which extracts a shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding natural language. Few studies in SRL have been conducted in the medical domain, primarily due to lack of annotated clinical SRL corpora, which are time-consuming and costly to build. The goal of this study is to investigate domain adaptation techniques for clinical SRL leveraging resources built from newswire and biomedical literature to improve performance and save annotation costs.

MATERIALS AND METHODS

Multisource Integrated Platform for Answering Clinical Questions (MiPACQ), a manually annotated SRL clinical corpus, was used as the target domain dataset. PropBank and NomBank from newswire and BioProp from biomedical literature were used as source domain datasets. Three state-of-the-art domain adaptation algorithms were employed: instance pruning, transfer self-training, and feature augmentation. The SRL performance using different domain adaptation algorithms was evaluated by using 10-fold cross-validation on the MiPACQ corpus. Learning curves for the different methods were generated to assess the effect of sample size.

RESULTS AND CONCLUSION

When all three source domain corpora were used, the feature augmentation algorithm achieved statistically significant higher F-measure (83.18%), compared to the baseline with MiPACQ dataset alone (F-measure, 81.53%), indicating that domain adaptation algorithms may improve SRL performance on clinical text. To achieve a comparable performance to the baseline method that used 90% of MiPACQ training samples, the feature augmentation algorithm required <50% of training samples in MiPACQ, demonstrating that annotation costs of clinical SRL can be reduced significantly by leveraging existing SRL resources from other domains.

Collapse

Nourani E, Khunjush F, Durmuş S. Computational approaches for prediction of pathogen-host protein-protein interactions. Front Microbiol 2015;6:94. [PMID: 25759684 PMCID: PMC4338785 DOI: 10.3389/fmicb.2015.00094] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 01/26/2015] [Indexed: 12/25/2022] Open

Shell J, Coupland S. Fuzzy Transfer Learning: Methodology and application. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.09.004] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Lee HJ, Dang TC, Lee H, Park JC. OncoSearch: cancer gene search engine with literature evidence. Nucleic Acids Res 2014;42:W416-21. [PMID: 24813447 PMCID: PMC4086113 DOI: 10.1093/nar/gku368] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Kshirsagar M, Carbonell J, Klein-Seetharaman J. Multitask learning for host-pathogen protein interactions. Bioinformatics 2013;29:i217-26. [PMID: 23812987 PMCID: PMC3694681 DOI: 10.1093/bioinformatics/btt245] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Motivation: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host–pathogen interactions in several diseases to build stronger predictive models. Our approach is based on a formalism from machine learning called ‘multitask learning’, which considers the problem of building models across tasks that are related to each other. A ‘task’ in our scenario is the set of host–pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e. diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks.

Results: Our current work on host–pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multitask learning technique we develop uses a task-based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex–Concave procedure-based algorithm. We compare our integrative approach to baseline methods that build models on a single host–pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyze the protein interaction predictions generated by the models, and find some interesting insights.

Availability: The predictions and code are available at: http://www.cs.cmu.edu/∼mkshirsa/ismb2013_paper320.html

Contact:j.klein-seetharaman@warwick.ac.uk

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Moal IH, Moretti R, Baker D, Fernández-Recio J. Scoring functions for protein-protein interactions. Curr Opin Struct Biol 2013;23:862-7. [PMID: 23871100 DOI: 10.1016/j.sbi.2013.06.017] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Revised: 06/26/2013] [Accepted: 06/29/2013] [Indexed: 12/24/2022]