1
|
Ramesh Kashyap A, Yang Y, Kan MY. Scientific document processing: challenges for modern learning methods. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2023:1-27. [PMID: 37361127 PMCID: PMC10036973 DOI: 10.1007/s00799-023-00352-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 02/20/2023] [Accepted: 02/21/2023] [Indexed: 03/26/2023]
Abstract
Neural network models enjoy success on language tasks related to Web documents, including news and Wikipedia articles. However, the characteristics of scientific publications pose specific challenges that have yet to be satisfactorily addressed: the discourse structure of scientific documents crucial in scholarly document processing (SDP) tasks, the interconnected nature of scientific documents, and their multimodal nature. We survey modern neural network learning methods that tackle these challenges: those that can model discourse structure and their interconnectivity and use their multimodal nature. We also highlight efforts to collect large-scale datasets and tools developed to enable effective deep learning deployment for SDP. We conclude with a discussion on upcoming trends and recommend future directions for pursuing neural natural language processing approaches for SDP.
Collapse
Affiliation(s)
- Abhinav Ramesh Kashyap
- ASUS Intelligent Cloud Services (AICS), Singapore, Singapore
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 11741 Singapore
| | - Yajing Yang
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 11741 Singapore
| | - Min-Yen Kan
- School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 11741 Singapore
| |
Collapse
|
2
|
Nagar S, Barbhuiya FA, Dey K. Towards more robust hate speech detection: using social context and user data. SOCIAL NETWORK ANALYSIS AND MINING 2023. [DOI: 10.1007/s13278-023-01051-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
|
3
|
Chen X, Ye P, Huang L, Wang C, Cai Y, Deng L, Ren H. Exploring science-technology linkages: A deep learning-empowered solution. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
4
|
Dai T, Zhao J, Li D, Tian S, Zhao X, Pan S. Heterogeneous deep graph convolutional network with citation relational BERT for COVID-19 inline citation recommendation. EXPERT SYSTEMS WITH APPLICATIONS 2023; 213:118841. [PMID: 36157791 PMCID: PMC9482209 DOI: 10.1016/j.eswa.2022.118841] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/02/2022] [Accepted: 09/12/2022] [Indexed: 06/16/2023]
Abstract
The outbreak of COVID-19 brings almost the biggest explosions of scientific literature ever. Facing such volume literature, it is hard for researches to find desired citation when carrying out COVID-19 related research, especially for junior researchers. This paper presents a novel neural network based method, called citation relational BERT with heterogeneous deep graph convolutional network (CRB-HDGCN), for COVID-19 inline citation recommendation task. The CRB-HDGCN contains two main stages. The first stage is to enhance the representation learning of BERT model for COVID-19 inline citation recommendation task through CRB. To achieve the above goal, an augmented citation sentence corpus, which replaces the citation placeholder with the title of the cited papers, is used to lightly retrain BERT model. In addition, we extract three types of sentence pair according citation relation, and establish sentence prediction tasks to further fine-tune the BERT model. The second stage is to learn effective dense vector of nodes among COVID-19 bibliographic graph through HDGCN. The HDGCN contains four layers which are essentially all sub neural networks. The first layer is initial embedding layer which generates initial input vectors with fixed size through CRB and a multilayer perceptron. The second layer is a heterogeneous graph convolutional layer. In this layer, we expand traditional homogeneous graph convolutional network into heterogeneous by subtly adding heterogeneous nodes and relations. The third layer is a deep attention layer. This layer uses trainable project vectors to reweight the node importance simultaneously according to both node types and convolution layers, which further promotes the performance of learnt node vectors. The last decoder layer recovers the graph structure and let the whole network trainable. The recommendation is finally achieved by integrating the high performance heterogeneous vectors learnt from CRB-HDGCN with the query vectors. We conduct experiments on the CORD-19 and LitCovid datasets. The results show that compared with the second best method CO-Search, CRB-HDGCN improves MAP, MRR, P@100 and R@100 with 21.8%, 22.7%, 37.6% and 21.2% on CORD-19, and 29.1%, 25.9%, 15.3% and 11.3% on LitCovid, respectively.
Collapse
Affiliation(s)
- Tao Dai
- School of Future Transportation, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Jie Zhao
- School of Economics and Management, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Dehong Li
- School of Economics and Management, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Shun Tian
- School of Future Transportation, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Xiangmo Zhao
- School of Information Engineering, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Shirui Pan
- Faculty of Information Technology, Monash University, Melbourne, Australia
| |
Collapse
|
5
|
Research on semantic representation and citation recommendation of scientific papers with multiple semantics fusion. Scientometrics 2023. [DOI: 10.1007/s11192-022-04566-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
6
|
A novel NIH research grant recommender using BERT. PLoS One 2023; 18:e0278636. [PMID: 36649346 PMCID: PMC9844873 DOI: 10.1371/journal.pone.0278636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/19/2022] [Indexed: 01/18/2023] Open
Abstract
Research grants are important for researchers to sustain a good position in academia. There are many grant opportunities available from different funding agencies. However, finding relevant grant announcements is challenging and time-consuming for researchers. To resolve the problem, we proposed a grant announcements recommendation system for the National Institute of Health (NIH) grants using researchers' publications. We formulated the recommendation as a classification problem and proposed a recommender using state-of-the-art deep learning techniques: i.e. Bidirectional Encoder Representations from Transformers (BERT), to capture intrinsic, non-linear relationship between researchers' publications and grants announcements. Internal and external evaluations were conducted to assess the system's usefulness. During internal evaluations, the grant citations were used to establish grant-publication ground truth, and results were evaluated against Recall@k, Precision@k, Mean reciprocal rank (MRR) and Area under the Receiver Operating Characteristic curve (ROC-AUC). During external evaluations, researchers' publications were clustered using Dirichlet Process Mixture Model (DPMM), recommended grants by our model were then aggregated per cluster through Recency Weight, and finally researchers were invited to provide ratings to recommendations to calculate Precision@k. For comparison, baseline recommenders using Okapi Best Matching (BM25), Term-Frequency Inverse Document Frequency (TF-IDF), doc2vec, and Naïve Bayes (NB) were also developed. Both internal and external evaluations (all metrics) revealed favorable performances of our proposed BERT-based recommender.
Collapse
|
7
|
Kart Ö, Mestiashvili A, Lachmann K, Kwasnicki R, Schroeder M. Emati: a recommender system for biomedical literature based on supervised learning. Database (Oxford) 2022; 2022:6885256. [PMID: 36484479 PMCID: PMC9732843 DOI: 10.1093/database/baac104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 11/07/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022]
Abstract
The scientific literature continues to grow at an ever-increasing rate. Considering that thousands of new articles are published every week, it is obvious how challenging it is to keep up with newly published literature on a regular basis. Using a recommender system that improves the user experience in the online environment can be a solution to this problem. In the present study, we aimed to develop a web-based article recommender service, called Emati. Since the data are text-based by nature and we wanted our system to be independent of the number of users, a content-based approach has been adopted in this study. A supervised machine learning model has been proposed to generate article recommendations. Two different supervised learning approaches, namely the naïve Bayes model with Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and the state-of-the-art language model bidirectional encoder representations from transformers (BERT), have been implemented. In the first one, a list of documents is converted into TF-IDF-weighted features and fed into a classifier to distinguish relevant articles from irrelevant ones. Multinomial naïve Bayes algorithm is used as a classifier since, along with the class label, it also gives the probability that the input belongs to this class. The second approach is based on fine-tuning the pretrained state-of-the-art language model BERT for the text classification task. Emati provides a weekly updated list of article recommendations and presents it to the user, sorted by probability scores. New article recommendations are also sent to users' email addresses on a weekly basis. Additionally, Emati has a personalized search feature to search online services' (such as PubMed and arXiv) content and have the results sorted by the user's classifier. Database URL: https://emati.biotec.tu-dresden.de.
Collapse
Affiliation(s)
- Özge Kart
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47-49, Dresden 01307, Germany,Department of Computer Engineering, Dokuz Eylül University, Tinaztepe Campus, Buca 35160 Izmir, Turkey
| | - Alexandre Mestiashvili
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47-49, Dresden 01307, Germany
| | - Kurt Lachmann
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47-49, Dresden 01307, Germany
| | - Richard Kwasnicki
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47-49, Dresden 01307, Germany
| | | |
Collapse
|
8
|
ScholarRec: a scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00345-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Horbach SPJM, Oude Maatman FJW, Halffman W, Hepkema WM. Automated citation recommendation tools encourage questionable citations. RESEARCH EVALUATION 2022. [DOI: 10.1093/reseval/rvac016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Abstract
Citing practices have long been at the heart of scientific reporting, playing both socially and epistemically important functions in science. While such practices have been relatively stable over time, recent attempts to develop automated citation recommendation tools have the potential to drastically impact citing practices. We claim that, even though such tools may come with tempting advantages, their development and implementation should be conducted with caution. Describing the role of citations in science’s current publishing and social reward structures, we argue that automated citation tools encourage questionable citing practices. More specifically, we describe how such tools may lead to an increase in: perfunctory citation and sloppy argumentation; affirmation biases; and Matthew effects. In addition, a lack of transparency of the tools’ underlying algorithmic structure renders their usage problematic. Hence, we urge that the consequences of citation recommendation tools should at least be understood and assessed before any attempts to implementation or broad distribution are undertaken.
Collapse
Affiliation(s)
- Serge P J M Horbach
- Danish Centre for Studies in Research and Research Policy, Aarhus University , Bartholins Allé 7 , Aarhus C 8000, Denmark
- Faculty of Social Sciences, Centre for Science and Technology Studies (CWTS), Leiden University , Wassenaarseweg 62A , Leiden 2333 AL, The Netherlands
| | - Freek J W Oude Maatman
- Department of Philosophy of Behavioural Science, Faculty of Social Science, Radboud University Nijmegen ,Thomas van Aquinostraat 4, Nijmegen, 6500 HE, The Netherlands
- Department of Theoretical Philosophy, Faculty of Philosophy, University of Groningen , Oude Boteringestraat 52, Groningen, 9712 GL, The Netherlands
| | - Willem Halffman
- Institute for Science in Society, Radboud University Nijmegen , Heyendaalseweg 135, Nijmegen, 6525AJ, The Netherlands
| | - Wytske M Hepkema
- Institute for Science in Society, Radboud University Nijmegen , Heyendaalseweg 135, Nijmegen, 6525AJ, The Netherlands
| |
Collapse
|
10
|
Lin J, Yu Y, Song J, Shi X. Detecting and analyzing missing citations to published scientific entities. Scientometrics 2022. [DOI: 10.1007/s11192-022-04334-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
11
|
Wang HC, Cheng JW, Yang CT. SentCite: a sentence-level citation recommender based on the salient similarity among multiple segments. Scientometrics 2022. [DOI: 10.1007/s11192-022-04339-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Choi J, Lee J, Yoon J, Jang S, Kim J, Choi S. A two-stage deep learning-based system for patent citation recommendation. Scientometrics 2022. [DOI: 10.1007/s11192-022-04301-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Identification of topic evolution: network analytics with piecewise linear representation and word embedding. Scientometrics 2022. [DOI: 10.1007/s11192-022-04273-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
14
|
Cai X, Wang N, Yang L, Mei X. Global-local neighborhood based network representation for citation recommendation. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02964-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Pornprasit C, Liu X, Kiattipadungkul P, Kertkeidkachorn N, Kim KS, Noraset T, Hassan SU, Tuarob S. Enhancing citation recommendation using citation network embedding. Scientometrics 2022. [DOI: 10.1007/s11192-021-04196-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
16
|
Bert-Enhanced Text Graph Neural Network for Classification. ENTROPY 2021; 23:e23111536. [PMID: 34828233 PMCID: PMC8624482 DOI: 10.3390/e23111536] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/14/2021] [Accepted: 11/17/2021] [Indexed: 11/25/2022]
Abstract
Text classification is a fundamental research direction, aims to assign tags to text units. Recently, graph neural networks (GNN) have exhibited some excellent properties in textual information processing. Furthermore, the pre-trained language model also realized promising effects in many tasks. However, many text processing methods cannot model a single text unit’s structure or ignore the semantic features. To solve these problems and comprehensively utilize the text’s structure information and semantic information, we propose a Bert-Enhanced text Graph Neural Network model (BEGNN). For each text, we construct a text graph separately according to the co-occurrence relationship of words and use GNN to extract text features. Moreover, we employ Bert to extract semantic features. The former part can take into account the structural information, and the latter can focus on modeling the semantic information. Finally, we interact and aggregate these two features of different granularity to get a more effective representation. Experiments on standard datasets demonstrate the effectiveness of BEGNN.
Collapse
|
17
|
Taju SW, Shah SMA, Ou YY. Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders. Anal Biochem 2021; 633:114416. [PMID: 34656612 DOI: 10.1016/j.ab.2021.114416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/07/2021] [Accepted: 10/11/2021] [Indexed: 10/20/2022]
Abstract
Efflux proteins are the transport proteins expressed in the plasma membrane, which are involved in the movement of unwanted toxic substances through specific efflux pumps. Several studies based on computational approaches have been proposed to predict transport proteins and thereby to understand the mechanism of the movement of ions across cell membranes. However, few methods were developed to identify efflux proteins. This paper presents an approach based on the contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) with the Support Vector Machine (SVM) classifier. BERT is the most effective pre-trained language model that performs exceptionally well on several Natural Language Processing (NLP) tasks. Therefore, the contextualized representations from BERT were implemented to incorporate multiple interpretations of identical amino acids in the sequence. A dataset of efflux proteins with annotations was first established. The feature vectors were extracted by transferring protein data through the hidden layers of the pre-trained model. Our proposed method was trained on complete training datasets to identify efflux proteins and achieved the accuracies of 94.15% and 87.13% in the independent tests on membrane and transport datasets, respectively. This study opens a research avenue for the implementation of contextualized word embeddings in Bioinformatics and Computational Biology.
Collapse
Affiliation(s)
- Semmy Wellem Taju
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Syed Muazzam Ali Shah
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.
| |
Collapse
|
18
|
Relationships between method-section citation rates and citation contexts: evidence from highly cited references in psychology. ONLINE INFORMATION REVIEW 2021. [DOI: 10.1108/oir-03-2021-0171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe Method section of research articles offers an important space for researchers to describe their research processes and research objects they utilize. To understand the relationship between these research materials and their representations in scientific publications, this paper offers a quantitative examination of the citation contexts of the most frequently cited references in the Method section of the paper sample, many of which belong to the category of research material objects.Design/methodology/approachIn this research, the authors assessed the extent to which these references appear in the Method section, which is regarded as an indicator of the instrumentality of the reference. The authors also examined how this central measurement is connected to its other citation contexts, such as key linguistic attributes and verbs that are used in citation sentences.FindingsThe authors found that a series of key linguistic attributes can be used to predict the instrumentality of a reference. The use of self-mention phrases and the readability score of the citances are especially strong predictors, along with boosters and hedges, the two measurements that were not included in the final model.Research limitations/implicationsThis research focuses on a single research domain, psychology, which limits the understanding of how research material objects are cited in different research domains or interdisciplinary research contexts. Moreover, this research is based on 200 frequently cited references, which are unable to represent all references cited in psychological publications.Practical implicationsWith the identified relationship between instrumental citation contexts and other characteristics of citation sentences, this research opens the possibility of more accurately identifying research material objects from scientific references, the most accessible scholarly data.Originality/valueThis is the first large-scale, quantitative analysis of the linguistic features of citations to research material objects. This study offers important baseline results for future studies focusing on scientific instruments, an increasingly important type of object involved in scientific research.Peer reviewThe peer review history for this article is available at: 10.1108/OIR-03-2021-0171
Collapse
|
19
|
Dattolo A, Corbatto M. Assisting researchers in bibliographic tasks: A new usable, real‐time tool for analyzing bibliographies. J Assoc Inf Sci Technol 2021. [DOI: 10.1002/asi.24578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Antonina Dattolo
- SASWEB Research Lab, Department of Mathematics, Computer Science, and Physics University of Udine Gorizia
| | - Marco Corbatto
- SASWEB Research Lab, Department of Mathematics, Computer Science, and Physics University of Udine Gorizia
| |
Collapse
|
20
|
La Quatra M, Cagliero L, Baralis E. Leveraging full-text article exploration for citation analysis. Scientometrics 2021. [DOI: 10.1007/s11192-021-04117-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractScientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases.
Collapse
|
21
|
Qiu T, Yu C, Zhong Y, An L, Li G. A scientific citation recommendation model integrating network and text representations. Scientometrics 2021. [DOI: 10.1007/s11192-021-04161-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
22
|
Ali Shah SM, Ou YY. TRP-BERT: Discrimination of transient receptor potential (TRP) channels using contextual representations from deep bidirectional transformer based on BERT. Comput Biol Med 2021; 137:104821. [PMID: 34508974 DOI: 10.1016/j.compbiomed.2021.104821] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/16/2022]
Abstract
Transient receptor potential (TRP) channels are non-selective cation channels that act as ion channels and are primarily found on the plasma membrane of numerous animal cells. These channels are involved in the physiology and pathophysiology of a wide variety of biological processes, including inhibition and progression of cancer, pain initiation, inflammation, regulation of pressure, thermoregulation, secretion of salivary fluid, and homeostasis of Ca2+ and Mg2+. Increasing evidences indicate that mutations in the gene encoding TRP channels play an essential role in a broad array of diseases. Therefore, these channels are becoming popular as potential drug targets for several diseases. The diversified role of these channels demands a prediction model to classify TRP channels from other channel proteins (non-TRP channels). Therefore, we presented an approach based on the Support Vector Machine (SVM) classifier and contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a deeply bidirectional language model and a neural network approach to Natural Language Processing (NLP) that achieves outstanding performance on various NLP tasks. We apply BERT to generate contextualized representations for every single amino acid in a protein sequence. Interestingly, these representations are context-sensitive and vary for the same amino acid appearing in different positions in the sequence. Our proposed method showed 80.00% sensitivity, 96.03% specificity, 95.47% accuracy, and a 0.56 Matthews correlation coefficient (MCC) for an independent test set. We suggest that our proposed method could effectively classify TRP channels from non-TRP channels and assist biologists in identifying new potential TRP channels.
Collapse
Affiliation(s)
- Syed Muazzam Ali Shah
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.
| |
Collapse
|
23
|
ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations. Comput Biol Chem 2021; 93:107537. [PMID: 34217007 DOI: 10.1016/j.compbiolchem.2021.107537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 05/09/2021] [Accepted: 06/26/2021] [Indexed: 01/08/2023]
Abstract
MOTIVATION Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence. RESULTS The performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.
Collapse
|
24
|
Semantic and relational spaces in science of science: deep learning models for article vectorisation. Scientometrics 2021. [DOI: 10.1007/s11192-021-03984-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractOver the last century, we observe a steady and exponential growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while GNN we enable us to build a relational space where the social practices of a research community are also encoded.
Collapse
|
25
|
A New Citation Recommendation Strategy Based on Term Functions in Related Studies Section. JOURNAL OF DATA AND INFORMATION SCIENCE 2021. [DOI: 10.2478/jdis-2021-0022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
Purpose
Researchers frequently encounter the following problems when writing scientific articles: (1) Selecting appropriate citations to support the research idea is challenging. (2) The literature review is not conducted extensively, which leads to working on a research problem that others have well addressed. The study focuses on citation recommendation in the related studies section by applying the term function of a citation context, potentially improving the efficiency of writing a literature review.
Design/methodology/approach
We present nine term functions with three newly created and six identified from existing literature. Using these term functions as labels, we annotate 531 research papers in three topics to evaluate our proposed recommendation strategy. BM25 and Word2vec with VSM are implemented as the baseline models for the recommendation. Then the term function information is applied to enhance the performance.
Findings
The experiments show that the term function-based methods outperform the baseline methods regarding the recall, precision, and F1-score measurement, demonstrating that term functions are useful in identifying valuable citations.
Research limitations
The dataset is insufficient due to the complexity of annotating citation functions for paragraphs in the related studies section. More recent deep learning models should be performed to future validate the proposed approach.
Practical implications
The citation recommendation strategy can be helpful for valuable citation discovery, semantic scientific retrieval, and automatic literature review generation.
Originality/value
The proposed citation function-based citation recommendation can generate intuitive explanations of the results for users, improving the transparency, persuasiveness, and effectiveness of recommender systems.
Collapse
|
26
|
|
27
|
Wang SH, Govindaraj VV, Górriz JM, Zhang X, Zhang YD. Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2021; 67:208-229. [PMID: 33052196 PMCID: PMC7544601 DOI: 10.1016/j.inffus.2020.10.004] [Citation(s) in RCA: 119] [Impact Index Per Article: 39.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 09/19/2020] [Accepted: 10/04/2020] [Indexed: 05/07/2023]
Abstract
(Aim) COVID-19 is an infectious disease spreading to the world this year. In this study, we plan to develop an artificial intelligence based tool to diagnose on chest CT images. (Method) On one hand, we extract features from a self-created convolutional neural network (CNN) to learn individual image-level representations. The proposed CNN employed several new techniques such as rank-based average pooling and multiple-way data augmentation. On the other hand, relation-aware representations were learnt from graph convolutional network (GCN). Deep feature fusion (DFF) was developed in this work to fuse individual image-level features and relation-aware features from both GCN and CNN, respectively. The best model was named as FGCNet. (Results) The experiment first chose the best model from eight proposed network models, and then compared it with 15 state-of-the-art approaches. (Conclusion) The proposed FGCNet model is effective and gives better performance than all 15 state-of-the-art methods. Thus, our proposed FGCNet model can assist radiologists to rapidly detect COVID-19 from chest CT images.
Collapse
Affiliation(s)
- Shui-Hua Wang
- Department of Cardiovascular Sciences, University of Leicester, LE1 7RH, UK
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- School of Architecture Building and Civil engineering, Loughborough University, Loughborough LE11 3TU, UK
| | - Vishnu Varthanan Govindaraj
- Department of Biomedical Engineering, Kalasalingam Academy of Research and Education, Srivilliputhur post, Krishnankoil 626 126, Tamil Nadu, India
| | - Juan Manuel Górriz
- Department of Signal Theory, Networking and Communications, University of Granada, Granada, Spain
- Department of Psychiatry, University of Cambridge, Cambridge CB21TN, UK
| | - Xin Zhang
- Department of Medical Imaging, The Fourth People's Hospital of Huai'an, Huai'an, Jiangsu Province, 223002, China
| | - Yu-Dong Zhang
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
- School of Informatics, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
28
|
Haruna K, Ismail MA, Qazi A, Kakudi HA, Hassan M, Muaz SA, Chiroma H. Research paper recommender system based on public contextual metadata. Scientometrics 2020. [DOI: 10.1007/s11192-020-03642-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7148240 DOI: 10.1007/978-3-030-45439-5_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the strength of Graph Convolutional Networks (GCN). In this paper, we propose VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN). Local information and global information interact through different layers of BERT, allowing them to influence mutually and to build together a final representation for classification. In our experiments on several text classification datasets, our approach outperforms BERT and GCN alone, and achieve higher effectiveness than that reported in previous studies.
Collapse
|