1
|
Ebrahim M, Alsmirat M, Al-Ayyoub M. Advanced disk herniation computer aided diagnosis system. Sci Rep 2024; 14:8071. [PMID: 38580700 PMCID: PMC10997754 DOI: 10.1038/s41598-024-58283-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 03/27/2024] [Indexed: 04/07/2024] Open
Abstract
Over recent years, researchers and practitioners have encountered massive and continuous improvements in the computational resources available for their use. This allowed the use of resource-hungry Machine learning (ML) algorithms to become feasible and practical. Moreover, several advanced techniques are being used to boost the performance of such algorithms even further, which include various transfer learning techniques, data augmentation, and feature concatenation. Normally, the use of these advanced techniques highly depends on the size and nature of the dataset being used. In the case of fine-grained medical image sets, which have subcategories within the main categories in the image set, there is a need to find the combination of the techniques that work the best on these types of images. In this work, we utilize these advanced techniques to find the best combinations to build a state-of-the-art lumber disc herniation computer-aided diagnosis system. We have evaluated the system extensively and the results show that the diagnosis system achieves an accuracy of 98% when it is compared with human diagnosis.
Collapse
Affiliation(s)
- Maad Ebrahim
- Department of Computer Science and Operations Research (DIRO), University of Montreal, Montreal, QC, H3T1J4, Canada
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan
| | - Mohammad Alsmirat
- Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates.
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan.
| | - Mahmoud Al-Ayyoub
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates.
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan.
| |
Collapse
|
2
|
Al-Bashabsheh E, Alaiad A, Al-Ayyoub M, Beni-Yonis O, Zitar RA, Abualigah L. Improving clinical documentation: automatic inference of ICD-10 codes from patient notes using BERT model. J Supercomput 2023; 79:12766-12790. [DOI: 10.1007/s11227-023-05160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/04/2023] [Indexed: 09/01/2023]
|
3
|
|
4
|
Fadel A, Tuffaha I, Al-Ayyoub M. Neural Arabic Text Diacritization: State-of-the-Art Results and a Novel Approach for Arabic NLP Downstream Tasks. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3470849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF), and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models even those requiring human-crafted language-dependent post-processing steps, unlike ours. Moreover, we show how diacritics in Arabic can be used to enhance the models of downstream NLP tasks such as Machine Translation (MT) and Sentiment Analysis (SA) by proposing novel
Translation over Diacritization
(ToD) and
Sentiment over Diacritization
(SoD) approaches.
Collapse
Affiliation(s)
- Ali Fadel
- Jordan University of Science and Technology, Irbid, Jordan
| | | | | |
Collapse
|
5
|
Alsmirat M, Al-Mnayyis N, Al-Ayyoub M, Al-Mnayyis A. Deep Learning-Based Disk Herniation Computer Aided Diagnosis System From MRI Axial Scans. IEEE Access 2022. [DOI: 10.1109/access.2022.3158682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Mohammad Alsmirat
- Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates
| | - Nusaiba Al-Mnayyis
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | - Mahmoud Al-Ayyoub
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | | |
Collapse
|
6
|
Abdullah M, Al-Ayyoub M, AlRawashdeh S, Shatnawi F. E-learningDJUST: E-learning dataset from Jordan university of science and technology toward investigating the impact of COVID-19 pandemic on education. Neural Comput Appl 2021; 35:11481-11495. [PMID: 34803236 PMCID: PMC8590139 DOI: 10.1007/s00521-021-06712-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 10/27/2021] [Indexed: 11/23/2022]
Abstract
Recently, the COVID-19 pandemic has triggered different behaviors in education, especially during the lockdown, to contain the virus outbreak in the world. As a result, educational institutions worldwide are currently using online learning platforms to maintain their education presence. This research paper introduces and examines a dataset, E-LearningDJUST, that represents a sample of the student’s study progress during the pandemic at Jordan University of Science and Technology (JUST). The dataset depicts a sample of the university’s students as it includes 9,246 students from 11 faculties taking four courses in spring 2020, summer 2020, and fall 2021 semesters. To the best of our knowledge, it is the first collected dataset that reflects the students’ study progress within a Jordanian institute using e-learning system records. One of this work’s key findings is observing a high correlation between e-learning events and the final grades out of 100. Thus, the E-LearningDJUST dataset has been experimented with two robust machine learning models (Random Forest and XGBoost) and one simple deep learning model (Feed Forward Neural Network) to predict students’ performances. Using RMSE as the primary evaluation criteria, the RMSE values range between 7 and 17. Among the other main findings, the application of feature selection with the random forest leads to better prediction results for all courses as the RMSE difference ranges between (0–0.20). Finally, a comparison study examined students’ grades before and after the Coronavirus pandemic to understand how it impacted their grades. A high success rate has been observed during the pandemic compared to what it was before, and this is expected because the exams were online. However, the proportion of students with high marks remained similar to that of pre-pandemic courses.
Collapse
Affiliation(s)
- Malak Abdullah
- Computer Science, Jordan University of Science and Technology, Irbid, 22110 Jordan
| | - Mahmoud Al-Ayyoub
- Computer Science, Jordan University of Science and Technology, Irbid, 22110 Jordan
| | - Saif AlRawashdeh
- Computer Science, Jordan University of Science and Technology, Irbid, 22110 Jordan
| | - Farah Shatnawi
- Computer Science, Jordan University of Science and Technology, Irbid, 22110 Jordan
| |
Collapse
|
7
|
Abedalla A, Abdullah M, Al-Ayyoub M, Benkhelifa E. Chest X-ray pneumothorax segmentation using U-Net with EfficientNet and ResNet architectures. PeerJ Comput Sci 2021; 7:e607. [PMID: 34307860 PMCID: PMC8279140 DOI: 10.7717/peerj-cs.607] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 05/31/2021] [Indexed: 06/09/2023]
Abstract
Medical imaging refers to visualization techniques to provide valuable information about the internal structures of the human body for clinical applications, diagnosis, treatment, and scientific research. Segmentation is one of the primary methods for analyzing and processing medical images, which helps doctors diagnose accurately by providing detailed information on the body's required part. However, segmenting medical images faces several challenges, such as requiring trained medical experts and being time-consuming and error-prone. Thus, it appears necessary for an automatic medical image segmentation system. Deep learning algorithms have recently shown outstanding performance for segmentation tasks, especially semantic segmentation networks that provide pixel-level image understanding. By introducing the first fully convolutional network (FCN) for semantic image segmentation, several segmentation networks have been proposed on its basis. One of the state-of-the-art convolutional networks in the medical image field is U-Net. This paper presents a novel end-to-end semantic segmentation model, named Ens4B-UNet, for medical images that ensembles four U-Net architectures with pre-trained backbone networks. Ens4B-UNet utilizes U-Net's success with several significant improvements by adapting powerful and robust convolutional neural networks (CNNs) as backbones for U-Nets encoders and using the nearest-neighbor up-sampling in the decoders. Ens4B-UNet is designed based on the weighted average ensemble of four encoder-decoder segmentation models. The backbone networks of all ensembled models are pre-trained on the ImageNet dataset to exploit the benefit of transfer learning. For improving our models, we apply several techniques for training and predicting, including stochastic weight averaging (SWA), data augmentation, test-time augmentation (TTA), and different types of optimal thresholds. We evaluate and test our models on the 2019 Pneumothorax Challenge dataset, which contains 12,047 training images with 12,954 masks and 3,205 test images. Our proposed segmentation network achieves a 0.8608 mean Dice similarity coefficient (DSC) on the test set, which is among the top one-percent systems in the Kaggle competition.
Collapse
Affiliation(s)
- Ayat Abedalla
- Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | - Malak Abdullah
- Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | - Mahmoud Al-Ayyoub
- Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | - Elhadj Benkhelifa
- Smart Systems, AI and Cybersecurity Research Centre, Staffordshire University, Stoke on Trent, UK
| |
Collapse
|
8
|
|
9
|
|
10
|
|
11
|
|
12
|
Abstract
Purpose
The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.
Design/methodology/approach
Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.
Findings
The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.
Practical implications
Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.
Originality/value
This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.
Collapse
|
13
|
Abstract
Purpose
Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.
Design/methodology/approach
This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.
Findings
The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.
Originality/value
Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.
Collapse
|
14
|
Smadi MA, Obaidat I, Al-Ayyoub M, Mohawesh R, Jararweh Y. Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews. International Journal of Information Technology and Web Engineering 2016. [DOI: 10.4018/ijitwe.2016070102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Sentiment Analysis (SA) is the process of determining the sentiment of a text written in a natural language to be positive, negative or neutral. It is one of the most interesting subfields of natural language processing (NLP) and Web mining due to its diverse applications and the challenges associated with applying it on the massive amounts of textual data available online (especially, on social networks). Most of the current work on SA focus on the English language and work on the sentence-level or the document-level. This work focuses on the less studied version of SA, which is aspect-based SA (ABSA) for the Arabic language. Specifically, this work considers two ABSA tasks: aspect category determination and aspect category polarity determination, and makes use of the publicly available human annotated Arabic dataset (HAAD) along with its baseline experiments conducted by HAAD providers. In this work, several lexicon-based approaches are presented for the two tasks at hand and show that some of the presented approaches significantly outperforms the best-known result on the given dataset. An enhancement of 9% and 46% were achieved in the tasks aspect category determination and aspect category polarity determination respectively.
Collapse
Affiliation(s)
| | - Islam Obaidat
- Jordan University of Science and Technology, Irbid, Jordan
| | - Mahmoud Al-Ayyoub
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | - Rami Mohawesh
- Jordan University of Science and Technology, Irbid, Jordan
| | - Yaser Jararweh
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| |
Collapse
|
15
|
Jarrah M, Jaradat M, Jararweh Y, Al-Ayyoub M, Bousselham A. A hierarchical optimization model for energy data flow in smart grid power systems. INFORM SYST 2015. [DOI: 10.1016/j.is.2014.12.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
16
|
Jaradat M, Jarrah M, Bousselham A, Jararweh Y, Al-Ayyoub M. The Internet of Energy: Smart Sensor Networks and Big Data Management for Smart Grid. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.procs.2015.07.250] [Citation(s) in RCA: 141] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
17
|
Abstract
Text categorization or classification (TC) is concerned with placing text documents in their proper category according to their contents. Owing to the various applications of TC and the large volume of text documents uploaded on the Internet daily, the need for such an automated method stems from the difficulty and tedium of performing such a process manually. The usefulness of TC is manifested in different fields and needs. For instance, the ability to automatically classify an article or an email into its right class (Arts, Economics, Politics, Sports, etc.) would be appreciated by individual users as well as companies. This paper is concerned with TC of Arabic articles. It contains a comparison of the five best known algorithms for TC. It also studies the effects of utilizing different Arabic stemmers (light and root-based stemmers) on the effectiveness of these classifiers. Furthermore, a comparison between different data mining software tools (Weka and RapidMiner) is presented. The results illustrate the good accuracy provided by the SVM classifier, especially when used with the light10 stemmer. This outcome can be used in future as a baseline to compare with other unexplored classifiers and Arabic stemmers.
Collapse
|
18
|
Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M, Al-Kabi MN, Al-rifai S. Towards Improving the Lexicon-Based Approach for Arabic Sentiment Analysis. International Journal of Information Technology and Web Engineering 2014. [DOI: 10.4018/ijitwe.2014070104] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The emergence of the Web 2.0 technology generated a massive amount of raw data by enabling Internet users to post their opinions on the web. Processing this raw data to extract useful information can be a very challenging task. An example of important information that can be automatically extracted from the users' posts is their opinions on different issues. This problem of Sentiment Analysis (SA) has been studied well on the English language and two main approaches have been devised: corpus-based and lexicon-based. This work focuses on the later approach due to its various challenges and high potential. The discussions in this paper take the reader through the detailed steps of building the main two components of the lexicon-based SA approach: the lexicon and the SA tool. The experiments show that significant efforts are still needed to reach a satisfactory level of accuracy for the lexicon-based Arabic SA. Nonetheless, they do provide an interesting guide for the researchers in their on-going efforts to improve lexicon-based SA.
Collapse
Affiliation(s)
- Nawaf A. Abdulla
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | - Nizar A. Ahmed
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | - Mohammed A. Shehab
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | - Mahmoud Al-Ayyoub
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | | | - Saleh Al-rifai
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| |
Collapse
|