1
|
Zhu X, Ruan Q, Qian S, Zhang M. A hybrid model based on transformer and Mamba for enhanced sequence modeling. Sci Rep 2025; 15:11428. [PMID: 40180947 PMCID: PMC11968869 DOI: 10.1038/s41598-025-87574-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 01/20/2025] [Indexed: 04/05/2025] Open
Abstract
State Space Models (SSMs) have made remarkable strides in language modeling in recent years. With the introduction of Mamba, these models have garnered increased attention, often surpassing Transformers in specific areas. Nevertheless, despite Mamba's unique strengths, Transformers remain essential due to their advanced computational capabilities and proven effectiveness. In this paper, we propose a novel model that effectively integrates the strengths of both Transformers and Mamba. Specifically, our model utilizes the Transformer's encoder for encoding tasks while employing Mamba as the decoder for decoding tasks. We introduce a feature fusion technique that combines the features generated by the encoder with the hidden states produced by the decoder. This approach successfully merges the advantages of the Transformer and Mamba, resulting in enhanced performance. Comprehensive experiments across various language tasks demonstrate that our proposed model consistently achieves competitive results, outperforming existing benchmarks.
Collapse
Affiliation(s)
- Xiaocui Zhu
- Jiangxi Academy Sciences, Institute of Energy, Nanchang, 330029, Jiangxi, China.
| | - Qunsheng Ruan
- Department of nature science and computer, Ganzhou Teachers College, Ganzhou, 341000, Jiangxi, China
| | - Sai Qian
- Jiangxi Academy Sciences, Institute of Energy, Nanchang, 330029, Jiangxi, China
| | - Miaohui Zhang
- Jiangxi Academy Sciences, Institute of Energy, Nanchang, 330029, Jiangxi, China
| |
Collapse
|
2
|
Abadi VNM, Ghasemian F. Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model. Sci Rep 2025; 15:80. [PMID: 39747858 PMCID: PMC11695816 DOI: 10.1038/s41598-024-78235-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 10/29/2024] [Indexed: 01/04/2025] Open
Abstract
In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time to digest comprehensively. A viable solution to address this challenge lies in the realm of automatic text summarization, which is a pivotal and intricate endeavor within the field of natural language processing. Text summarization involves transforming pertinent textual content into a concise format that reduces its word count without compromising its underlying meaning. In recent years, transformers have emerged as a prominent force in the landscape of natural language processing, particularly in the realm of text summarization. This research endeavors to harness the power of transformers by training the mT5-base model on a three-step fine-tuning phase on Persian news articles. Subsequently, reinforcement learning via the PPO algorithm is integrated with the fine-tuned model. Finally, we evaluate the model's performance in summarizing Persian texts, shedding light on its efficacy in addressing the formidable task of distilling meaningful insights from a sea of textual data. Our model has set a new benchmark in the field of Persian text summarization, achieving outstanding ROUGE scores of 53.17 for ROUGE-1, 37.12 for ROUGE-2, and 44.13 for ROUGE-L. These remarkable results reflect a significant advancement in the quality of Persian text summarization, signaling a promising era of more refined and context-aware summaries.
Collapse
Affiliation(s)
- Vahid Nejad Mahmood Abadi
- Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Fahimeh Ghasemian
- Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.
| |
Collapse
|
3
|
Swanson K, He S, Calvano J, Chen D, Telvizian T, Jiang L, Chong P, Schwell J, Mak G, Lee J. Biomedical text readability after hypernym substitution with fine-tuned large language models. PLOS DIGITAL HEALTH 2024; 3:e0000489. [PMID: 38625843 PMCID: PMC11020904 DOI: 10.1371/journal.pdig.0000489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 03/21/2024] [Indexed: 04/18/2024]
Abstract
The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need for domain-specific expertise. We aimed to study the simplification of biomedical text via large language models (LLMs) commonly used for general natural language processing tasks involve text comprehension, summarization, generation, and prediction of new text from prompts. Specifically, we finetuned three variants of large language models to perform substitutions of complex words and word phrases in biomedical text with a related hypernym. The output of the text substitution process using LLMs was evaluated by comparing the pre- and post-substitution texts using four readability metrics and two measures of sentence complexity. A sample of 1,000 biomedical definitions in the National Library of Medicine's Unified Medical Language System (UMLS) was processed with three LLM approaches, and each showed an improvement in readability and sentence complexity after hypernym substitution. Readability scores were translated from a pre-processed collegiate reading level to a post-processed US high-school level. Comparison between the three LLMs showed that the GPT-J-6b approach had the best improvement in measures of sentence complexity. This study demonstrates the merit of hypernym substitution to improve readability of complex biomedical text for the public and highlights the use case for fine-tuning open-access large language models for biomedical natural language processing.
Collapse
Affiliation(s)
- Karl Swanson
- Department of Medicine–Clinical Informatics, University of California–San Francisco, San Francisco, United States of America
| | - Shuhan He
- Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Josh Calvano
- Department of Anesthesiology and Critical Care, University of New Mexico Hospital, Albuquerque, New Mexico, United States of America
| | - David Chen
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Talar Telvizian
- Department of Internal Medicine, Main Line Health Lankenau Medical Center, Wynnewood, Pennsylvania, United States of America
| | - Lawrence Jiang
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Paul Chong
- School of Osteopathic Medicine, Campbell University, Lillington, North Carolina, United States of America
| | - Jacob Schwell
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, Pennsylvania, United States of America
| | - Gin Mak
- Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada
| | - Jarone Lee
- Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
4
|
Lin CS, Jwo JS, Lee CH. Adapting Static and Contextual Representations for Policy Gradient-Based Summarization. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094513. [PMID: 37177717 PMCID: PMC10181762 DOI: 10.3390/s23094513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 04/19/2023] [Accepted: 05/04/2023] [Indexed: 05/15/2023]
Abstract
Considering the ever-growing volume of electronic documents made available in our daily lives, the need for an efficient tool to capture their gist increases as well. Automatic text summarization, which is a process of shortening long text and extracting valuable information, has been of great interest for decades. Due to the difficulties of semantic understanding and the requirement of large training data, the development of this research field is still challenging and worth investigating. In this paper, we propose an automated text summarization approach with the adaptation of static and contextual representations based on an extractive approach to address the research gaps. To better obtain the semantic expression of the given text, we explore the combination of static embeddings from GloVe (Global Vectors) and the contextual embeddings from BERT (Bidirectional Encoder Representations from Transformer) and GPT (Generative Pre-trained Transformer) based models. In order to reduce human annotation costs, we employ policy gradient reinforcement learning to perform unsupervised training. We conduct empirical studies on the public dataset, Gigaword. The experimental results show that our approach achieves promising performance and is competitive with various state-of-the-art approaches.
Collapse
Affiliation(s)
- Ching-Sheng Lin
- Master Program of Digital Innovation, Tunghai University, Taichung 40704, Taiwan
| | - Jung-Sing Jwo
- Master Program of Digital Innovation, Tunghai University, Taichung 40704, Taiwan
- Department of Computer Science, Tunghai University, Taichung 40704, Taiwan
| | - Cheng-Hsiung Lee
- Master Program of Digital Innovation, Tunghai University, Taichung 40704, Taiwan
| |
Collapse
|
5
|
Zhao Q, Niu J, Liu X, He W, Tang S. Generation of Coherent Multi-Sentence Texts with a Coherence Mechanism. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2022.101457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
6
|
Chen Z, Lin H. Improving named entity correctness of abstractive summarization by generative negative sampling. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
|
7
|
Demilie WB. Comparative Analysis of Automated Text Summarization Techniques: The Case of Ethiopian Languages. WIRELESS COMMUNICATIONS AND MOBILE COMPUTING 2022; 2022:1-28. [DOI: 10.1155/2022/3282127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Nowadays, there is an abundance of information available from both online and offline sources. For a single topic, we can get more than hundreds of sources containing a wealth of information. The ability to extract or generate a summary of popular content allows users to quickly search for content and obtain preliminary data in the shortest amount of time. Manually extracting useful information from them is a difficult task. Automatic text summarization (ATS) systems are being developed to address this issue. Text summarization is the process of extracting useful information from large documents and compressing it into a summary while retaining all the relevant contents. This review paper provides a broad overview of ATS research works in various Ethiopian languages such as Amharic, Afan Oromo, and Tigrinya using different text summarization approaches. The work has identified the novel and recommended state-of-the-art techniques and methods for future researchers in the area and provides knowledge and useful support to new researchers in this field by providing a concise overview of the various feature extraction methods and classification techniques required for different types of ATS approaches applied to the Ethiopian languages. Finally, different recommendations for future researchers have been forwarded.
Collapse
|
8
|
ABRO AA, TALPUR MSH, JUMANİ AK. Natural Language Processing Challenges and Issues: A Literature Review. GAZI UNIVERSITY JOURNAL OF SCIENCE 2022. [DOI: 10.35378/gujs.1032517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Natural Language Processing (NLP) is the computerized approach to analysing text using both structured and unstructured data. NLP is a simple, empirically powerful, and reliable approach. It achieves state-of-the-art performance in language processing tasks like Semantic Search (SS), Machine Translation (MT), Text Summarization (TS), Sentiment Analyser (SA), Named Entity Recognition (NER) and Emotion Detection (ED). NLP is expected to be the technology of the future, based on current technology deployment and adoption. The primary question is: What does NLP have to offer in terms of reality, and what are the prospects? There are several problems to be addressed with this developing method, as it must be compatible with future technology. In this paper, the benefits, challenges and limitations of this innovative paradigm along with the areas open to do research are shown.
Collapse
|
9
|
Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136584] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.
Collapse
|
10
|
The HoPE Model Architecture: a Novel Approach to Pregnancy Information Retrieval Based on Conversational Agents. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2022; 6:253-294. [PMID: 35411331 PMCID: PMC8985747 DOI: 10.1007/s41666-022-00115-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 01/26/2022] [Accepted: 02/16/2022] [Indexed: 10/28/2022]
|
11
|
Transformer-Based Abstractive Summarization for Reddit and Twitter: Single Posts vs. Comment Pools in Three Languages. FUTURE INTERNET 2022. [DOI: 10.3390/fi14030069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Abstractive summarization is a technique that allows for extracting condensed meanings from long texts, with a variety of potential practical applications. Nonetheless, today’s abstractive summarization research is limited to testing the models on various types of data, which brings only marginal improvements and does not lead to massive practical employment of the method. In particular, abstractive summarization is not used for social media research, where it would be very useful for opinion and topic mining due to the complications that social media data create for other methods of textual analysis. Of all social media, Reddit is most frequently used for testing new neural models of text summarization on large-scale datasets in English, without further testing on real-world smaller-size data in various languages or from various other platforms. Moreover, for social media, summarizing pools of texts (one-author posts, comment threads, discussion cascades, etc.) may bring crucial results relevant for social studies, which have not yet been tested. However, the existing methods of abstractive summarization are not fine-tuned for social media data and have next-to-never been applied to data from platforms beyond Reddit, nor for comments or non-English user texts. We address these research gaps by fine-tuning the newest Transformer-based neural network models LongFormer and T5 and testing them against BART, and on real-world data from Reddit, with improvements of up to 2%. Then, we apply the best model (fine-tuned T5) to pools of comments from Reddit and assess the similarity of post and comment summarizations. Further, to overcome the 500-token limitation of T5 for analyzing social media pools that are usually bigger, we apply LongFormer Large and T5 Large to pools of tweets from a large-scale discussion on the Charlie Hebdo massacre in three languages and prove that pool summarizations may be used for detecting micro-shifts in agendas of networked discussions. Our results show, however, that additional learning is definitely needed for German and French, as the results for these languages are non-satisfactory, and more fine-tuning is needed even in English for Twitter data. Thus, we show that a ‘one-for-all’ neural-network summarization model is still impossible to reach, while fine-tuning for platform affordances works well. We also show that fine-tuned T5 works best for small-scale social media data, but LongFormer is helpful for larger-scale pool summarizations.
Collapse
|