1
|
|
|
10 |
68 |
2
|
Alsmadi I, Alhami I. Clustering and classification of email contents. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2015. [DOI: 10.1016/j.jksuci.2014.03.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
|
10 |
41 |
3
|
Wahbeh A, Al-Kabi M, Al-Radaideh Q, Al-Shawakfa E, Alsmadi I. The Effect of Stemming on Arabic Text Classification. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2011. [DOI: 10.4018/ijirr.2011070104] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.
Collapse
|
|
14 |
12 |
4
|
Al-Kabi M, Wahsheh H, Alsmadi I, Al-Shawakfa E, Wahbeh A, Al-Hmoud A. Content-based analysis to detect Arabic web spam. J Inf Sci 2012. [DOI: 10.1177/0165551512439173] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Search engines are important outlets for information query and retrieval. They have to deal with the continual increase of information available on the web, and provide users with convenient access to such huge amounts of information. Furthermore, with this huge amount of information, a more complex challenge that continuously gets more and more difficult to illuminate is the spam in web pages. For several reasons, web spammers try to intrude in the search results and inject artificially biased results in favour of their websites or pages. Spam pages are added to the internet on a daily basis, thus making it difficult for search engines to keep up with the fast-growing and dynamic nature of the web, especially since spammers tend to add more keywords to their websites to deceive the search engines and increase the rank of their pages. In this research, we have investigated four different classification algorithms (naïve Bayes, decision tree, SVM and K-NN) to detect Arabic web spam pages, based on content. The three groups of datasets used, with 1%, 15% and 50% spam contents, were collected using a crawler that was customized for this study. Spam pages were classified manually. Different tests and comparisons have revealed that the Decision Tree was the best classifier for this purpose.
Collapse
|
|
13 |
10 |
5
|
Alomari A, Idris N, Sabri AQM, Alsmadi I. Deep reinforcement and transfer learning for abstractive text summarization: A review. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2021.101276] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
|
3 |
8 |
6
|
Ahmad R, Alsmadi I, Alhamdani W, Tawalbeh L. A comprehensive deep learning benchmark for IoT IDS. Comput Secur 2022. [DOI: 10.1016/j.cose.2021.102588] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
|
3 |
4 |
7
|
Obeidat R, Alsmadi I, Bani Bakr Q, Obeidat L. Can Users Search Trends Predict People Scares or Disease Breakout? An Examination of Infectious Skin Diseases in the United States. Infect Dis (Lond) 2020; 13:1178633720928356. [PMID: 32565678 PMCID: PMC7285938 DOI: 10.1177/1178633720928356] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 04/29/2020] [Indexed: 11/17/2022] Open
Abstract
Background In health and medicine, people heavily use the Internet to search for information about symptoms, diseases, and treatments. As such, the Internet information can simulate expert medical doctors, pharmacists, and other health care providers. Aim This article aims to evaluate a dataset of search terms to determine whether search queries and terms can be used to reliably predict skin disease breakouts. Furthermore, the authors propose and evaluate a model to decide when to declare a particular month as Epidemic at the US national level. Methods A Model was designed to distinguish a breakout in skin diseases based on the number of monthly discovered cases. To apply this model, the authors correlated Google Trends of popular search terms with monthly reported Rubella and Measles cases from Centers for Disease Control and Prevention (CDC). Regressions and decision trees were used to determine the impact of different terms to trigger the occurrence of epidemic classes. Results Results showed that the volume of search keywords for Rubella and Measles rises when the volume of those reported diseases rises. Results also implied that the overall process was successful and should be repeated with other diseases. Such process can trigger different actions or activities to be taken when a certain month is declared as "Epidemic." Furthermore, this research has shown great interest for vaccination against Measles and Rubella. Conclusions The findings suggest that the search queries and keyword trends can be truly reliable to be used for the prediction of disease outbreaks and some other related knowledge extraction applications. Also search-term surveillance can provide an additional tool for infectious disease surveillance. Future research needs to re-apply the model used in this article, and researchers need to question whether characterizing the epidemiology of Coronavirus Disease 2019 (COVID-19) pandemic waves in United States can be done through search queries and keyword trends.
Collapse
|
Journal Article |
5 |
4 |
8
|
Al-Abdullah M, Alsmadi I, AlAbdullah R, Farkas B. Designing privacy-friendly data repositories: a framework for a blockchain that follows the GDPR. DIGITAL POLICY, REGULATION AND GOVERNANCE 2020. [DOI: 10.1108/dprg-04-2020-0050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The paper posits that a solution for businesses to use privacy-friendly data repositories for its customers’ data is to change from the traditional centralized repository to a trusted, decentralized data repository. Blockchain is a technology that provides such a data repository. However, the European Union’s General Data Protection Regulation (GDPR) assumed a centralized data repository, and it is commonly argued that blockchain technology is not usable. This paper aims to posit a framework for adopting a blockchain that follows the GDPR.
Design/methodology/approach
The paper uses the Levy and Ellis’ narrative review of literature methodology, which is based on constructivist theory posited by Lincoln and Guba. Using five information systems and computer science databases, the researchers searched for studies using the keywords GDPR and blockchain, using a forward and backward search technique. The search identified a corpus of 416 candidate studies, from which the researchers applied pre-established criteria to select 39 studies. The researchers mined this corpus for concepts, which they clustered into themes. Using the accepted computer science practice of privacy by design, the researchers combined the clustered themes into the paper’s posited framework.
Findings
The paper posits a framework that provides architectural tactics for designing a blockchain that follows GDPR to enhance privacy. The framework explicitly addresses the challenges of GDPR compliance using the unimagined decentralized storage of personal data. The framework addresses the blockchain–GDPR tension by establishing trust between a business and its customers vis-à-vis storing customers’ data. The trust is established through blockchain’s capability of providing the customer with private keys and control over their data, e.g. processing and access.
Research limitations/implications
The paper provides a framework that demonstrates that blockchain technology can be designed for use in GDPR compliant solutions. In using the framework, a blockchain-based solution provides the ability to audit and monitor privacy measures, demonstrates a legal justification for processing activities, incorporates a data privacy policy, provides a map for data processing and ensures security and privacy awareness among all actors. The research is limited to a focus on blockchain–GDPR compliance; however, future research is needed to investigate the use of the framework in specific domains.
Practical implications
The paper posits a framework that identifies the strategies and tactics necessary for GDPR compliance. Practitioners need to compliment the framework with rigorous privacy risk management, i.e. conducting a privacy risk analysis, identifying strategies and tactics to address such risks and preparing a privacy impact assessment that enhances accountability and transparency of a blockchain.
Originality/value
With the increasingly strategic use of data by businesses and the contravening growth of data privacy regulation, alternative technologies could provide businesses with a means to nurture trust with its customers regarding collected data. However, it is commonly assumed that the decentralized approach of blockchain technology cannot be applied to this business need. This paper posits a framework that enables a blockchain to be designed that follows the GDPR; thereby, providing an alternative for businesses to collect customers’ data while ensuring the customers’ trust.
Collapse
|
|
5 |
3 |
9
|
Alsmadi I, Zarour M. Online integrity and authentication checking for Quran electronic versions. APPLIED COMPUTING AND INFORMATICS 2017. [DOI: 10.1016/j.aci.2015.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
|
8 |
1 |
10
|
Alsmadi I, O'Brien MJ. Rating news claims: Feature selection and evaluation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2019; 17:1922-1939. [PMID: 32233515 DOI: 10.3934/mbe.2020101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
News claims that travel the Internet and online social networks (OSNs) originate from different, sometimes unknown sources, which raises issues related to the credibility of those claims and the drivers behind them. Fact-checking websites such as Snopes, FactCheck, and Emergent use human evaluators to investigate and label news claims, but the process is labor- and time-intensive. Driven by the need to use data analytics and algorithms in assessing the credibility of news claims, we focus on what can be generalized about evaluating human-labeled claims. We developed tools to extract claims from Snopes and Emergent and used public datasets collected by and published on those websites. Claims extracted from those datasets were supervised or labeled with different claim ratings. We focus on claims with definite ratings-false, mostly false, true, and mostly true, with the goal of identifying distinctive features that can be used to distinguish true from false claims. Ultimately, those features can be used to predict future unsupervised or unlabeled claims. We evaluate different methods to extract features as well as different sets of features and their ability to predict the correct claim label. By far, we noticed that OSN websites report high rates of false claims in comparison with most of the other website categories. The rate of reported false claims is higher than the rate of true claims in fact-checking websites in most categories. At the content-analysis level, false claims tend to have more negative tones in sentiments and hence can provide supporting features to predict claim classification.
Collapse
|
|
6 |
|
11
|
Ahmad R, Alsmadi I, Alhamdani W, Tawalbeh L. A Deep Learning Ensemble Approach to Detecting Unknown Network Attacks. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2022. [DOI: 10.1016/j.jisa.2022.103196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
|
3 |
|
12
|
Obeidat R, Alsmadi I, Baker QB, Al-Njadat A, Srinivasan S. Researching public health datasets in the era of deep learning: a systematic literature review. Health Informatics J 2025; 31:14604582241307839. [PMID: 39794941 DOI: 10.1177/14604582241307839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2025]
Abstract
Objective: Explore deep learning applications in predictive analytics for public health data, identify challenges and trends, and then understand the current landscape. Materials and Methods: A systematic literature review was conducted in June 2023 to search articles on public health data in the context of deep learning, published from the inception of medical and computer science databases through June 2023. The review focused on diverse datasets, abstracting applications, challenges, and advancements in deep learning. Results: 2004 articles were reviewed, identifying 14 disease categories. Observed trends include explainable-AI, patient embedding learning, and integrating different data sources and employing deep learning models in health informatics. Noted challenges were technical reproducibility and handling sensitive data. Discussion: There has been a notable surge in deep learning applications on public health data publications since 2015. Consistent deep learning applications and models continue to be applied across public health data. Despite the wide applications, a standard approach still does not exist for addressing the outstanding challenges and issues in this field. Conclusion: Guidelines are needed for applying deep learning and models in public health data to improve FAIRness, efficiency, transparency, comparability, and interoperability of research. Interdisciplinary collaboration among data scientists, public health experts, and policymakers is needed to harness the full potential of deep learning.
Collapse
|
Systematic Review |
1 |
|
13
|
Alsmadi I, Rice NM, O’Brien MJ. Fake or not? Automated detection of COVID-19 misinformation and disinformation in social networks and digital media. COMPUTATIONAL AND MATHEMATICAL ORGANIZATION THEORY 2022; 30:1-19. [PMID: 36466587 PMCID: PMC9702725 DOI: 10.1007/s10588-022-09369-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
With the continuous spread of the COVID-19 pandemic, misinformation poses serious threats and concerns. COVID-19-related misinformation integrates a mixture of health aspects along with news and political misinformation. This mixture complicates the ability to judge whether a claim related to COVID-19 is information, misinformation, or disinformation. With no standard terminology in information and disinformation, integrating different datasets and using existing classification models can be impractical. To deal with these issues, we aggregated several COVID-19 misinformation datasets and compared differences between learning models from individual datasets versus one that was aggregated. We also evaluated the impact of using several word- and sentence-embedding models and transformers on the performance of classification models. We observed that whereas word-embedding models showed improvements in all evaluated classification models, the improvement level varied among the different classifiers. Although our work was focused on COVID-19 misinformation detection, a similar approach can be applied to myriad other topics, such as the recent Russian invasion of Ukraine.
Collapse
|
research-article |
3 |
|
14
|
Maabreh M, Qolomany B, Springstead J, Alsmadi I, Gupta A. Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2017:1175-1182. [PMID: 34408917 PMCID: PMC8370709 DOI: 10.1109/bibm.2017.8217824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.
Collapse
|
research-article |
8 |
|
15
|
Sharma S, Alsmadi I, Alkhawaldeh RS, Al‐Ahmad B. Data-driven analysis and predictive modeling on COVID-19. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2022; 34:e7390. [PMID: 36718458 PMCID: PMC9877906 DOI: 10.1002/cpe.7390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 04/12/2022] [Accepted: 09/14/2022] [Indexed: 06/18/2023]
Abstract
The coronavirus (COVID-19) started in China in 2019, has spread rapidly in every single country and has spread in millions of cases worldwide. This paper presents a proposed approach that involves identifying the relative impact of COVID-19 on a specific gender, the mortality rate in specific age, investigating different safety measures adopted by each country and their impact on the virus growth rate. Our study proposes data-driven analysis and prediction modeling by investigating three aspects of the pandemic (gender of patients, global growth rate, and social distancing). Several machine learning and ensemble models have been used and compared to obtain the best accuracy. Experiments have been demonstrated on three large public datasets. The motivation of this study is to propose an analytical machine learning based model to explore three significant aspects of COVID-19 pandemic as gender, global growth rate, and social distancing. The proposed analytical model includes classic classifiers, distinctive ensemble methods such as bagging, feature based ensemble, voting and stacking. The results show a superior prediction performance comparing with the related approaches.
Collapse
|
research-article |
3 |
|
16
|
Najadat H, Alaiad A, Alasal SA, Mrayyan GA, Alsmadi I. Integration of Data Envelopment Analysis and Clustering Methods. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1142/s0219649220400067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Data Envelopment Analysis (DEA) has been applied creatively in various study domains to compare and evaluate different Decision Making Units (DMUs) based on multiple input–output attributes. In this paper, the performance of Jordanian public hospitals is assessed via a methodology combining DEA with data mining methods, specifically, clustering. Initially, inputs of inefficient hospitals were altered to check for waste in the allocated resources. Then, the number of inputs–outputs was manipulated to test if the number is strongly influencing the productivity of the DMUs. The number of DMUs used was 27 public hospitals and the applicable efficiency measurements used were constant return to scale (CRS) and variable return to scale (VRS) through the DEAP software. Experiments showed that the efficiency of a hospital might be more meaningfully assessed if it is compared with a group of hospitals that are similar in some factors. More specifically, results of applying the CRS model proved that 77% of the hospitals were efficient. Additionally, we found that the inefficiencies of some hospitals are linked to weak resource utilization. It is concluded that number of inputs–outputs inserted in the efficiency evaluation process impacts the resulted values.
Collapse
|
|
5 |
|
17
|
Maabreh M, Qolomany B, Alsmadi I, Gupta A. Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2017:1909-1914. [PMID: 34430067 PMCID: PMC8382039 DOI: 10.1109/bibm.2017.8217951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.
Collapse
|
research-article |
8 |
|