1
|
Nekhlopochyn O, Verbov V, Tsymbaliuk I, Cheshuk I, Vorodi M. The choice of classification to determine the optimal tactics for treatment of the thoracolumbar junction traumatic injuries. Pol Merkur Lekarski 2024; 52:104-111. [PMID: 38518241 DOI: 10.36740/merkur202401116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
Abstract
OBJECTIVE Aim: To evaluate the influence of the degree of detail of the nature of the pathomorphological changes in the osteoligamentous structures on the tactics of treating the patients with the traumatic damage to the thoracolumbar junction. PATIENTS AND METHODS Materials and Methods: A retrospective analysis of the treatment tactics was carried out in 96 patients with a traumatic injury of the thoracolumbar junction, both those who underwent a surgical treatment and those who underwent a conservative therapy. The lesions were classified using F. Magerl and AOSpine classifications; the neurological status was assessed according to the ASIA scale, the nature of the damage was specified using the McCormack criteria. The statistical data processing was performed using the Random Forest machine learning algorithm. RESULTS Results: The nature of the injury makes it possible to unambiguously determine the optimal method of therapy when using the F. Magerl classification with a probability of 58.33%, while in relation to the AOSpine classification this figure is 55.21%. When building the models that include the nature of the damage, the level of the neurological disorders and the McCormack criteria, it was found that the use of the F. Magerl classification demonstrates an error in unambiguously determining the most effective treatment method at the level of 26.04%, while the use of AOSpine this figure was 21.88%. CONCLUSION Conclusions: The application of the AOSpine classification is more promising for the development of a multifactorial algorithm for the treatment of the traumatic injuries of the thoracolumbar junction.
Collapse
Affiliation(s)
- Oleksii Nekhlopochyn
- ROMODANOV NEUROSURGERY INSTITUTE OF NATIONAL ACADEMY OF MEDICAL SCIENCES OF UKRAINE, KYIV, UKRAINE
| | - Vadim Verbov
- ROMODANOV NEUROSURGERY INSTITUTE OF NATIONAL ACADEMY OF MEDICAL SCIENCES OF UKRAINE, KYIV, UKRAINE
| | - Iaroslav Tsymbaliuk
- ROMODANOV NEUROSURGERY INSTITUTE OF NATIONAL ACADEMY OF MEDICAL SCIENCES OF UKRAINE, KYIV, UKRAINE
| | - Ievgen Cheshuk
- ROMODANOV NEUROSURGERY INSTITUTE OF NATIONAL ACADEMY OF MEDICAL SCIENCES OF UKRAINE, KYIV, UKRAINE
| | - Milan Vorodi
- ROMODANOV NEUROSURGERY INSTITUTE OF NATIONAL ACADEMY OF MEDICAL SCIENCES OF UKRAINE, KYIV, UKRAINE
| |
Collapse
|
2
|
Abstract
Protein functions are closely related to their subcellular locations. At present, the prediction of protein subcellular locations is one of the most important problems in protein science. The evident defects of traditional methods make it urgent to design methods with high efficiency and low costs. To date, lots of computational methods have been proposed. However, this problem is far from being completely solved. Recently, some multi-label classifiers have been proposed to identify subcellular locations of human, animal, Gram-negative bacterial and eukaryotic proteins. These classifiers adopted the protein features derived from gene ontology information. Although they provided good performance, they can be further improved by adopting more powerful machine learning algorithms. In this study, four improved multi-label classifiers were set up for identification of subcellular locations of the above four protein types. The random k-labelsets (RAKEL) algorithm was used to tackle proteins with multiple locations, and random forest was used as the basic prediction engine. All classifiers were tested by jackknife test, indicating their high performance. Comparisons with previous classifiers further confirmed the superiority of the proposed classifiers.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Ruyun Qu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xintong Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
3
|
Zhou J, Su Z, Hosseini S, Tian Q, Lu Y, Luo H, Xu X, Chen C, Huang J. Decision tree models for the estimation of geo-polymer concrete compressive strength. Math Biosci Eng 2024; 21:1413-1444. [PMID: 38303471 DOI: 10.3934/mbe.2024061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
The green concretes industry benefits from utilizing gel to replace parts of the cement in concretes. However, measuring the compressive strength of geo-polymer concretes (CSGPoC) needs a significant amount of work and expenditure. Therefore, the best idea is predicting CSGPoC with a high level of accuracy. To do this, the base learner and super learner machine learning models were proposed in this study to anticipate CSGPoC. The decision tree (DT) is applied as base learner, and the random forest and extreme gradient boosting (XGBoost) techniques are used as super learner system. In this regard, a database was provided involving 259 CSGPoC data samples, of which four-fifths of is considered for the training model and one-fifth is selected for the testing models. The values of fly ash, ground-granulated blast-furnace slag (GGBS), Na2SiO3, NaOH, fine aggregate, gravel 4/10 mm, gravel 10/20 mm, water/solids ratio, and NaOH molarity were considered as input of the models to estimate CSGPoC. To evaluate the reliability and performance of the decision tree (DT), XGBoost, and random forest (RF) models, 12 performance evaluation metrics were determined. Based on the obtained results, the highest degree of accuracy is achieved by the XGBoost model with mean absolute error (MAE) of 2.073, mean absolute percentage error (MAPE) of 5.547, Nash-Sutcliffe (NS) of 0.981, correlation coefficient (R) of 0.991, R2 of 0.982, root mean square error (RMSE) of 2.458, Willmott's index (WI) of 0.795, weighted mean absolute percentage error (WMAPE) of 0.046, Bias of 2.073, square index (SI) of 0.054, p of 0.027, mean relative error (MRE) of -0.014, and a20 of 0.983 for the training model and MAE of 2.06, MAPE of 6.553, NS of 0.985, R of 0.993, R2 of 0.986, RMSE of 2.307, WI of 0.818, WMAPE of 0.05, Bias of 2.06, SI of 0.056, p of 0.028, MRE of -0.015, and a20 of 0.949 for the testing model. By importing the testing set into trained models, values of 0.8969, 0.9857, and 0.9424 for R2 were obtained for DT, XGBoost, and RF, respectively, which show the superiority of the XGBoost model in CSGPoC estimation. In conclusion, the XGBoost model is capable of more accurately predicting CSGPoC than DT and RF models.
Collapse
Affiliation(s)
- Ji Zhou
- College of Civil and Environmental Engineering, Hunan University of Science and Engineering, Yongzhou 425199, China
| | - Zhanlin Su
- Shandong Energy Group Xinwen Mining Co., Ltd., Taian 271233, China
| | - Shahab Hosseini
- Faculty of the Engineering, Tarbiat Modares University, Jalal AleAhmad, Nasr, Tehran, Iran
| | - Qiong Tian
- College of Civil and Environmental Engineering, Hunan University of Science and Engineering, Yongzhou 425199, China
| | - Yijun Lu
- School of Civil Engineering, Guangzhou University, Guangzhou 510006, China
| | - Hao Luo
- School of Civil Engineering, Guangzhou University, Guangzhou 510006, China
| | - Xingquan Xu
- Guangdong Hualu Transport Technology Co., Ltd, Guangzhou, China
| | - Chupeng Chen
- School of Civil Engineering, Guangzhou University, Guangzhou 510006, China
- Guangdong Hualu Transport Technology Co., Ltd, Guangzhou, China
| | - Jiandong Huang
- School of Civil Engineering, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
4
|
Fu S, Luo Y, Liu Y, Liao Q, Kong S, Yang A, Lin L, Li H. Mining association rules between the granulation feasibility and physicochemical properties of aqueous extracts from Chinese herbal medicine in fluidized bed granulation. Math Biosci Eng 2023; 20:19065-19085. [PMID: 38052591 DOI: 10.3934/mbe.2023843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Fluidized bed granulation (FBG) is a widely used granulation technology in the pharmaceutical industry. However, defluidization caused by the formation of large aggregates poses a challenge to FBG, particularly in traditional Chinese medicine (TCM) due to its complex physicochemical properties of aqueous extracts. Therefore, this study aims to identify the complex relationships between physicochemical characteristics and defluidization using data mining methods. Initially, 50 types of TCM were decocted and assessed for their potential influence on defluidization using a set of 11 physical properties and 10 chemical components, utilizing the loss rate as an evaluation index. Subsequently, the random forest (RF) and Apriori algorithms were utilized to uncover intricate association rules among physicochemical characteristics and defluidization. The RF algorithm analysis revealed the top 8 critical factors associated with defluidization. These factors include physical properties like glass transition temperature (Tg) and dynamic surface tension (DST) of DST100ms, DST1000ms, DST10ms and conductivity, in addition to chemical components such as fructose, glucose and protein contents. The results from Apriori algorithm demonstrated that lower Tg and conductivity were associated with an increased risk of defluidization, resulting in a higher loss rate. Moreover, DST100ms, DST1000ms and DST10ms exhibited a contrasting trend in the physical properties Specifically, defluidization probability increases when Tg and conductivity dip below 29.04℃ and 6.21 ms/m respectively, coupled with DST10ms, DST100ms and DST1000ms values exceeding 70.40 mN/m, 66.66 mN/m and 61.58 mN/m, respectively. Moreover, an elevated content of low molecular weight saccharides was associated with a higher occurrence of defluidization, accompanied by an increased loss rate. In contrast, protein content displayed an opposite trend regarding chemical properties. Precisely, the defluidization likelihood amplifies when fructose and glucose contents surpass 20.35 mg/g and 34.05 mg/g respectively, and protein concentration is less than 1.63 mg/g. Finally, evaluation criteria for defluidization were proposed based on these results, which could be used to avoid this situation during the granulation process. This study demonstrated that the RF and Apriori algorithms are effective data mining methods capable of uncovering key factors affecting defluidization.
Collapse
Affiliation(s)
- Sai Fu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yuting Luo
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yuling Liu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Qian Liao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Shasha Kong
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Anhui Yang
- Institute of Traditional Chinese Medicine Health Industry, China Academy of Chinese Medical Sciences, Jiangxi 330006, China
| | - Longfei Lin
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Hui Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
- Institute of Traditional Chinese Medicine Health Industry, China Academy of Chinese Medical Sciences, Jiangxi 330006, China
| |
Collapse
|
5
|
Abstract
Drugs, which treat various diseases, are essential for human health. However, developing new drugs is quite laborious, time-consuming, and expensive. Although investments into drug development have greatly increased over the years, the number of drug approvals each year remain quite low. Drug repositioning is deemed an effective means to accelerate the procedures of drug development because it can discover novel effects of existing drugs. Numerous computational methods have been proposed in drug repositioning, some of which were designed as binary classifiers that can predict drug-disease associations (DDAs). The negative sample selection was a common defect of this method. In this study, a novel reliable negative sample selection scheme, named RNSS, is presented, which can screen out reliable pairs of drugs and diseases with low probabilities of being actual DDAs. This scheme considered information from k-neighbors of one drug in a drug network, including their associations to diseases and the drug. Then, a scoring system was set up to evaluate pairs of drugs and diseases. To test the utility of the RNSS, three classic classification algorithms (random forest, bayes network and nearest neighbor algorithm) were employed to build classifiers using negative samples selected by the RNSS. The cross-validation results suggested that such classifiers provided a nearly perfect performance and were significantly superior to those using some traditional and previous negative sample selection schemes.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kaiyu Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bo Zhou
- Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
| |
Collapse
|
6
|
Ying A, Zhao Y, Hu X. Identification of biomarkers related to prostatic hyperplasia based on bioinformatics and machine learning. Math Biosci Eng 2023; 20:12024-12038. [PMID: 37501430 DOI: 10.3934/mbe.2023534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
In older adults, benign prostatic hyperplasia (BPH) is the most common cause of lower urinary tract symptoms (LUTS). This study aimed to explore the genes with diagnostic value in patients with BPH, reveal the relationship between the expression of diagnosis-related genes and the immune microenvironment, and provide a reference for molecular diagnosis and immunotherapy of BPH. The combined gene expression data of GSE6099, GSE7307 and GSE119195 in the GEO database were used. The differential expression of autophagy-related genes between BPH patients and healthy controls was obtained by differential analysis. Then the genes related to BPH diagnosis were screened by a machine learning algorithm and verified. Finally, five important genes (IGF1, PSIP1, SLC1A3, SLC2A1 and T1A1) were obtained by random forest (RF) algorithm, and their relationships with the immune microenvironment were discussed. Five genes play an essential role in the occurrence and development of BPH and may become new diagnostic markers of BPH. Among them, immune cells have significant correlation with some genes. The signal transduction of IL-4 mediated by M2 macrophages is closely related to the progress of BPH. There are abundant active mast cells in BPH. The adoption and metastasis of regulatory T cells may be an important method to treat BPH.
Collapse
Affiliation(s)
- Aiying Ying
- Department of Urology, Yongkang first people's Hospital, Yongkang, China
| | - Yueguang Zhao
- Department of Urology, Yongkang first people's Hospital, Yongkang, China
| | - Xiang Hu
- Department of Urology, Yongkang first people's Hospital, Yongkang, China
| |
Collapse
|
7
|
Kanani S, Patel S, Gupta RK, Jain A, Lin JCW. An AI-Enabled ensemble method for rainfall forecasting using Long-Short term memory. Math Biosci Eng 2023; 20:8975-9002. [PMID: 37161230 DOI: 10.3934/mbe.2023394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Rainfall prediction includes forecasting the occurrence of rainfall and projecting the amount of rainfall over the modeled area. Rainfall is the result of various natural phenomena such as temperature, humidity, atmospheric pressure, and wind direction, and is therefore composed of various factors that lead to uncertainties in the prediction of the same. In this work, different machine learning and deep learning models are used to (a) predict the occurrence of rainfall, (b) project the amount of rainfall, and (c) compare the results of the different models for classification and regression purposes. The dataset used in this work for rainfall prediction contains data from 49 Australian cities over a 10-year period and contains 23 features, including location, temperature, evaporation, sunshine, wind direction, and many more. The dataset contained numerous uncertainties and anomalies that caused the prediction model to produce erroneous projections. We, therefore, used several data preprocessing techniques, including outlier removal, class balancing for classification tasks using Synthetic Minority Oversampling Technique (SMOTE), and data normalization for regression tasks using Standard Scalar, to remove these uncertainties and clean the data for more accurate predictions. Training classifiers such as XGBoost, Random Forest, Kernel SVM, and Long-Short Term Memory (LSTM) are used for the classification task, while models such as Multiple Linear Regressor, XGBoost, Polynomial Regressor, Random Forest Regressor, and LSTM are used for the regression task. The experiment results show that the proposed approach outperforms several state-of-the-art approaches with an accuracy of 92.2% for the classification task, a mean absolute error of 11.7%, and an R2 score of 76% for the regression task.
Collapse
Affiliation(s)
- Sarth Kanani
- Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar 382007, Gujarat, India
| | - Shivam Patel
- Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar 382007, Gujarat, India
| | - Rajeev Kumar Gupta
- Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar 382007, Gujarat, India
| | - Arti Jain
- Department of Computer Science & Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
| | - Jerry Chun-Wei Lin
- Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
| |
Collapse
|
8
|
Kou B, Cao J, Huang W, Ma T. The rutting model of semi-rigid asphalt pavement based on RIOHTRACK full-scale track. Math Biosci Eng 2023; 20:8124-8145. [PMID: 37161189 DOI: 10.3934/mbe.2023353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Semi-rigid asphalt pavement has a wide range of application cases and data bases, and rutting is a typical failure mode of semi-rigid asphalt pavement. The establishment of an accurate rutting depth prediction model is of great significance to pavement design and maintenance. However, due to the lack of perfect theoretical system and systematic research data, the existing rutting prediction model of semi-rigid asphalt pavement is not accurate. In this paper, machine learning and mechanical-empirical model are combined to study the feature selection affecting the rutting evolution and rutting depth model of semi-rigid asphalt pavement. First, the particle swarm optimization random forest model is used to select the important features that affect the evolution of rutting depth. Second, the R-F model based on important features is proposed for the first time, which is compared with modification of rutting model in the Chinese Specifications for Design of Highway Asphalt Pavement (JTG D50-2017) and R-B model based on the improved Burgers model. The results show that the R-F model has more accurate prediction ability and better generalization ability, and it does not need complex data preprocessing and noise reduction. Here, the machine learning method is introduced to analyze the data characteristics, and the R-F rutting depth prediction model framework is innovatively proposed, which greatly improves the applicability and accuracy of the existing model framework.
Collapse
Affiliation(s)
- Bo Kou
- School of Mathematics, Southeast University, Nanjing 210096, China
| | - Jinde Cao
- School of Mathematics, Southeast University, Nanjing 210096, China
- Yonsei Frontier Lab, Yonsei University, Seoul 03722, South Korea
| | - Wei Huang
- Intelligent Transportation System Research Center, Southeast University, Nanjing 210096, China
| | - Tao Ma
- School of Transportation, Southeast University, Nanjing 210096, China
| |
Collapse
|
9
|
Su Z, Li C, Fu H, Wang L, Wu M, Feng X. Improved prognostic prediction model for liver cancer based on biomarker data screened by combined methods. Math Biosci Eng 2023; 20:5316-5332. [PMID: 36896547 DOI: 10.3934/mbe.2023246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Liver cancer is a common cause of death from cancer in the population, with the 4th highest mortality rate from cancer worldwide. The high recurrence rate of hepatocellular carcinoma after surgery is an important cause of high mortality among patients. In this paper, based on eight scheduled core markers of liver cancer, an improved feature screening algorithm was proposed based on the analysis of the basic principles of the random forest algorithm, and the system was finally applied to liver cancer prognosis prediction to improve the prediction of biomarkers for liver cancer recurrence, and the impact of different algorithmic strategies on the prediction accuracy was compared and analyzed. The results showed that the improved feature screening algorithm was able to reduce the feature set by about 50% while ensuring that the prediction accuracy was reduced within 2%.
Collapse
Affiliation(s)
- Zhiyue Su
- Faculty of Mathematical and Physical Sciences, University College London, London, WC1E 6BT, UK
| | - Chengquan Li
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China
| | - Haitian Fu
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China
| | - Liyang Wang
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China
| | - Meilong Wu
- Division of Hepatobiliary and Pancreas Surgery, Department of General Surgery, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University; The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, Guangdong, China
| | - Xiaobin Feng
- School of Clinical Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
10
|
Abstract
Drugs are an important means to treat various diseases. They are classified into several classes to indicate their properties and effects. Those in the same class always share some important features. The Kyoto Encyclopedia of Genes and Genomes (KEGG) DRUG recently reported a new drug classification system that classifies drugs into 14 classes. Correct identification of the class for any possible drug-like compound is helpful to roughly determine its effects for a particular type of disease. Experiments could be conducted to confirm such latent effects, thus accelerating the procedures for discovering novel drugs. In this study, this classification system was investigated. A classification model was proposed to assign one of the classes in the system to any given drug for the first time. Different from traditional fingerprint features, which indicated essential drug properties alone and were very popular in investigating drug-related problems, drugs were represented by novel features derived from a large drug network via a well-known network embedding algorithm called Node2vec. These features abstracted the drug associations generated from their essential properties, and they could overview each drug with all drugs as background. As class sizes were of great differences, synthetic minority over-sampling technique (SMOTE) was employed to tackle the imbalance problem. A balanced dataset was fed into the support vector machine to build the model. The 10-fold cross-validation results suggested the excellent performance of the model. This model was also superior to models using other drug features, including those generated by another network embedding algorithm and fingerprint features. Furthermore, this model provided more balanced performance across all classes than that without SMOTE.
Collapse
Affiliation(s)
- Chenhao Wu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
11
|
Khan MBS, Rahman AU, Nawaz MS, Ahmed R, Khan MA, Mosavi A. Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization. Math Biosci Eng 2022; 19:7978-8002. [PMID: 35801453 DOI: 10.3934/mbe.2022373] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Cancer is a manifestation of disorders caused by the changes in the body's cells that go far beyond healthy development as well as stabilization. Breast cancer is a common disease. According to the stats given by the World Health Organization (WHO), 7.8 million women are diagnosed with breast cancer. Breast cancer is the name of the malignant tumor which is normally developed by the cells in the breast. Machine learning (ML) approaches, on the other hand, provide a variety of probabilistic and statistical ways for intelligent systems to learn from prior experiences to recognize patterns in a dataset that can be used, in the future, for decision making. This endeavor aims to build a deep learning-based model for the prediction of breast cancer with a better accuracy. A novel deep extreme gradient descent optimization (DEGDO) has been developed for the breast cancer detection. The proposed model consists of two stages of training and validation. The training phase, in turn, consists of three major layers data acquisition layer, preprocessing layer, and application layer. The data acquisition layer takes the data and passes it to preprocessing layer. In the preprocessing layer, noise and missing values are converted to the normalized which is then fed to the application layer. In application layer, the model is trained with a deep extreme gradient descent optimization technique. The trained model is stored on the server. In the validation phase, it is imported to process the actual data to diagnose. This study has used Wisconsin Breast Cancer Diagnostic dataset to train and test the model. The results obtained by the proposed model outperform many other approaches by attaining 98.73 % accuracy, 99.60% specificity, 99.43% sensitivity, and 99.48% precision.
Collapse
Affiliation(s)
| | - Atta-Ur Rahman
- Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam 31441, Saudi Arabia
| | - Muhammad Saqib Nawaz
- Department of Computer Science & IT, Minhaj University Lahore, Lahore 54000, Pakistan
| | - Rashad Ahmed
- ICS Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | | | - Amir Mosavi
- John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary
- Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia
- Institute of Information Society, University of Public Service, 1083 Budapest, Hungary
| |
Collapse
|
12
|
Abstract
User data usually exists in the organization or own local equipment in the form of data island. It is difficult to collect these data to train better machine learning models because of the General Data Protection Regulation (GDPR) and other laws. The emergence of federated learning enables users to jointly train machine learning models without exposing the original data. Due to the fast training speed and high accuracy of random forest, it has been applied to federated learning among several data institutions. However, for human activity recognition task scenarios, the unified model cannot provide users with personalized services. In this paper, we propose a privacy-protected federated personalized random forest framework, which considers to solve the personalized application of federated random forest in the activity recognition task. According to the characteristics of the activity recognition data, the locality sensitive hashing is used to calculate the similarity of users. Users only train with similar users instead of all users and the model is incrementally selected using the characteristics of ensemble learning, so as to train the model in a personalized way. At the same time, user privacy is protected through differential privacy during the training stage. We conduct experiments on commonly used human activity recognition datasets to analyze the effectiveness of our model.
Collapse
Affiliation(s)
- Songfeng Liu
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Jinyan Wang
- Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Wenliang Zhang
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| |
Collapse
|
13
|
Qiu WR, Wang QK, Guan MY, Jia JH, Xiao X. Predicting S-nitrosylation proteins and sites by fusing multiple features. Math Biosci Eng 2021; 18:9132-9147. [PMID: 34814339 DOI: 10.3934/mbe.2021450] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Protein S-nitrosylation is one of the most important post-translational modifications, a well-grounded understanding of S-nitrosylation is very significant since it plays a key role in a variety of biological processes. For an uncharacterized protein sequence, it is a very meaningful problem for both basic research and drug development when we can firstly identify whether it is a S-nitrosylation protein or not, and then predict the specific S-nitrosylation site(s). This work has proposed two models for identifying S-nitrosylation protein and its PTM sites. Firstly, three kinds of features are extracted from protein sequence: KNN scoring of functional domain annotation, PseAAC and bag-of-words based on the physical and chemical properties of amino acids. Secondly, the synthetic minority oversampling technique is used to balance the data sets, and some state-of-the-art classifiers and feature fusion strategies are performed on the balanced data sets. In the five-fold cross-validation for predicting S-nitrosylation proteins, the results of Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 81.84%, 0.5178, 0.8635, respectively. Finally, a model for predicting S-nitrosylation sites has been constructed on the basis of tripeptide composition (TPC) and the composition of k-spaced amino acid pairs (CKSAAP). To eliminate redundant information and improve work efficiency, elastic nets are employed for feature selection. The five-fold cross-validation tests have indicated the promising success rates of the proposed model. For the convenience of related researchers, the web-server named "RF-SNOPS" has been established at http://www.jci-bioinfo.cn/RF-SNOPS.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Qian-Kun Wang
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Meng-Yue Guan
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Jian-Hua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| |
Collapse
|
14
|
Zhu F, Liu M, Wang F, Qiu D, Li R, Dai C. Automatic measurement of fetal femur length in ultrasound images: a comparison of random forest regression model and SegNet. Math Biosci Eng 2021; 18:7790-7805. [PMID: 34814276 DOI: 10.3934/mbe.2021387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The aim of this work is the preliminary clinical validation and accuracy evaluation of our automatic algorithms in assessing progression fetal femur length (FL) in ultrasound images. To compare the random forest regression model with the SegNet model from the two aspects of accuracy and robustness. In this study, we proposed a traditional machine learning method to detect the endpoints of FL based on a random forest regression model. Deep learning methods based on SegNet were proposed for the automatic measurement method of FL, which utilized skeletonization processing and improvement of the full convolution network. Then the automatic measurement results of the two methods were evaluated quantitatively and qualitatively with the results marked by doctors. 436 ultrasonic fetal femur images were evaluated by the two methods above. Compared the results of the above three methods with doctor's manual annotations, the automatic measurement method of femur length based on the random forest regression model was 1.23 ± 4.66 mm and the method based on SegNet was 0.46 ± 2.82 mm. The indicator for evaluating distance was significantly lower than the previous literature. Measurement method based SegNet performed better in the case of femoral end adhesion, low contrast, and noise interference similar to the shape of the femur. The segNet-based method achieves promising performance compared with the random forest regression model, which can improve the examination accuracy and robustness of the measurement of fetal femur length in ultrasound images.
Collapse
Affiliation(s)
- Fengcheng Zhu
- Department of Gynaecology and Obstetrics, the First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Mengyuan Liu
- Department of Gynaecology and Obstetrics, the First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Feifei Wang
- Anesthesiology department, the First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Di Qiu
- Department of Gynaecology and Obstetrics, the First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Ruiman Li
- Department of Gynaecology and Obstetrics, the First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Chenyang Dai
- Department of Gynaecology and Obstetrics, the First Affiliated Hospital of Jinan University, Guangzhou, China
| |
Collapse
|
15
|
Li N, Luo P, Li C, Hong Y, Zhang M, Chen Z. Analysis of related factors of radiation pneumonia caused by precise radiotherapy of esophageal cancer based on random forest algorithm. Math Biosci Eng 2021; 18:4477-4490. [PMID: 34198449 DOI: 10.3934/mbe.2021227] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The precise radiotherapy of esophageal cancer may cause different degrees of radiation damage for lung tissues and cause radioactive pneumonia. However, the occurrence of radioactive pneumonia is related to many factors. To further clarify the correlation between the occurrence of radioactive pneumonia and related factors, a random forest model was used to build a risk prediction model for patients with esophageal cancer undergoing radiotherapy. In this study, we retrospectively reviewed 118 patients with esophageal cancer confirmed by pathology in our hospital. The health characteristics and related parameters of all patients were analyzed, and the predictive effect of radiation pneumonia was discussed using the random forest algorithm. After treatment, 71 patients developed radioactive pneumonia (60.17%). In univariate analyses, age, planning target volume length, Karnofsky performance score (KPS), pulmonary emphysema, with or without chemotherapy, and the ratio of planning target volume to planning gross tumor volume (PTV/PGTV) in mediastinum were significantly associated with radioactive pneumonia (P < 0.05 for each comparison). Multivariate analysis revealed that with or without pulmonary emphysema (OR = 7.491, P = 0.001), PTV/PGTV (OR = 0.205, P = 0.007), and KPS (OR = 0.251, P = 0.011) were independent predictors for radiation pneumonia. The results concluded that the analysis of radiation pneumonia-related factors based on the random forest algorithm could build a mathematical prediction model for the easily obtained data. This algorithm also could effectively analyze the risk factors of radiation pneumonia and formulate the appropriate treatment plan for esophageal cancer.
Collapse
Affiliation(s)
- Na Li
- Department of Oncology Center, Second Hospital of Anhui Medical University, Hefei, Anhui 230601, China
| | - Peng Luo
- The First Department of Oncology, Cancer Hospital, Chinese Academy of Sciences, Hefei, Anhui 230031, China
| | - Chunyang Li
- Radiotherapy Center, Second Hospital of Anhui Medical University, Hefei, Anhui 230601, China
| | - Yanyan Hong
- Department of Oncology Center, Second Hospital of Anhui Medical University, Hefei, Anhui 230601, China
| | - Mingjun Zhang
- Department of Oncology Center, Second Hospital of Anhui Medical University, Hefei, Anhui 230601, China
| | - Zhendong Chen
- Department of Oncology Center, Second Hospital of Anhui Medical University, Hefei, Anhui 230601, China
| |
Collapse
|
16
|
Zulfiqar H, Khan RS, Hassan F, Hippe K, Hunt C, Ding H, Song XM, Cao R. Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. Math Biosci Eng 2021; 18:3348-3363. [PMID: 34198389 DOI: 10.3934/mbe.2021167] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/24/2023]
Abstract
N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer, enhanced nucleic acid composition and composition of k-spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Rida Sarwar Khan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Farwa Hassan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Ming Song
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Life Sciences, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| |
Collapse
|
17
|
Lai Cjs, Zhou RR, Yu Y, Zeng W, Hu MH, Fan LD, Chen L, Qiu ZD, Song C, Zhang SH, Guo LP, Huang LQ. [Rapid identification of geographical origins and determination of polysaccharides contents in Ganoderma lucidum based on near infrared spectroscopy and chemometrics]. Zhongguo Zhong Yao Za Zhi 2018; 43:3243-3248. [PMID: 30200725 DOI: 10.19540/j.cnki.cjcmm.20180514.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Indexed: 11/18/2022]
Abstract
Near infrared spectroscopy combined with chemometrics methods was used to distinguish Ganoderma lucidum samples collected from different origins, and a prediction model was established for rapid determine polysaccharides contents in these samples. The classification accuracy for training dataset was 96.87%, while for independent dataset was 93.33%; as for the prediction model, 5-fold cross-validation was used to optimize the parameters, and different signal processing methods were also optimized to improve the prediction ability of the model. The best square of correlation coefficients for training dataset was 0.965 4, and 0.851 6 for validation dataset; while the root-mean-square deviation values for training dataset and validation dataset were 0.018 5 and 0.023 6, respectively. These results showed that combining near infrared spectroscopy with suitable chemometrics approaches could accuracy distinguish different origins of G. lucidum samples; the established prediction model could precious predict polysaccharides contents, the proposed method can help determine the activity compounds and quality evaluation of G. lucidum.
Collapse
Affiliation(s)
- Lai Cjs
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Rong-Rong Zhou
- Institute of Chinese Materia Medica, Hunan Academy of Chinese Medicine, Changsha 410013, China
| | - Yi Yu
- Infinitus (China) Company Ltd., Guangzhou 510663, China
| | - Wen Zeng
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Ming-Hua Hu
- Infinitus (China) Company Ltd., Guangzhou 510663, China
| | - Luo-di Fan
- Infinitus (China) Company Ltd., Guangzhou 510663, China
| | - Lin Chen
- Institute of Chinese Materia Medica, Hunan Academy of Chinese Medicine, Changsha 410013, China
| | - Zi-Dong Qiu
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Chuan Song
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Shui-Han Zhang
- Institute of Chinese Materia Medica, Hunan Academy of Chinese Medicine, Changsha 410013, China
| | - Lan-Ping Guo
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Lu-Qi Huang
- State Key Laboratory of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| |
Collapse
|
18
|
Yu YY, Liu YG, Jiang Y, Li LM. [Prediction of drug-target interaction based on fingerprint similarity]. Zhongguo Zhong Yao Za Zhi 2017; 42:3578-3583. [PMID: 29218945 DOI: 10.19540/j.cnki.cjcmm.20170731.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 06/07/2023]
Abstract
Drugs play the pharmacological effects by combining with target proteins. Identification of drug-target interactions is important for discovering new functions of drugs. In this paper, the target fingerprints based on molecular substructure and the drug-target similarity based on fingerprints are proposed to a random forest-based classification method, in order to identify the drug-target interactions. Experiments on enzymes, ion channels, G protein-coupled receptors and nuclear receptors proved the effectiveness of the proposed method. In addition, the proposed method is applied to predict the interactions between ingredients and targets of traditional Chinese medicines.
Collapse
Affiliation(s)
- Ya-Yun Yu
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yong-Guo Liu
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu Jiang
- Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Li-Min Li
- Sichuan Academy of Chinese Medicine Sciences, Chengdu 610041, China
| |
Collapse
|