1
|
Ćeran M, Đorđević V, Miladinović J, Vasiljević M, Đukić V, Ranđelović P, Jaćimović S. Selective Genotyping and Phenotyping for Optimization of Genomic Prediction Models for Populations with Different Diversity. Plants (Basel) 2024; 13:975. [PMID: 38611503 PMCID: PMC11013471 DOI: 10.3390/plants13070975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/14/2024]
Abstract
To overcome the different challenges to food security caused by a growing population and climate change, soybean (Glycine max (L.) Merr.) breeders are creating novel cultivars that have the potential to improve productivity while maintaining environmental sustainability. Genomic selection (GS) is an advanced approach that may accelerate the rate of genetic gain in breeding using genome-wide molecular markers. The accuracy of genomic selection can be affected by trait architecture and heritability, marker density, linkage disequilibrium, statistical models, and training set. The selection of a minimal and optimal marker set with high prediction accuracy can lower genotyping costs, computational time, and multicollinearity. Selective phenotyping could reduce the number of genotypes tested in the field while preserving the genetic diversity of the initial population. This study aimed to evaluate different methods of selective genotyping and phenotyping on the accuracy of genomic prediction for soybean yield. The evaluation was performed on three populations: recombinant inbred lines, multifamily diverse lines, and germplasm collection. Strategies adopted for marker selection were as follows: SNP (single nucleotide polymorphism) pruning, estimation of marker effects, randomly selected markers, and genome-wide association study. Reduction of the number of genotypes was performed by selecting a core set from the initial population based on marker data, yet maintaining the original population's genetic diversity. Prediction ability using all markers and genotypes was different among examined populations. The subsets obtained by the model-based strategy can be considered the most suitable for marker selection for all populations. The selective phenotyping based on makers in all cases had higher values of prediction ability compared to minimal values of prediction ability of multiple cycles of random selection, with the highest values of prediction obtained using AN approach and 75% population size. The obtained results indicate that selective genotyping and phenotyping hold great potential and can be integrated as tools for improving or retaining selection accuracy by reducing genotyping or phenotyping costs for genomic selection.
Collapse
Affiliation(s)
- Marina Ćeran
- Laboratory for Biotechnology, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia
| | - Vuk Đorđević
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| | - Jegor Miladinović
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| | - Marjana Vasiljević
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| | - Vojin Đukić
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| | - Predrag Ranđelović
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| | - Simona Jaćimović
- Legumes Department, Institute of Field and Vegetable Crops, National Institute of the Republic of Serbia, Maksima Gorkog 30, 21000 Novi Sad, Serbia; (V.Đ.); (J.M.); (M.V.); (V.Đ.); (P.R.); (S.J.)
| |
Collapse
|
2
|
Peng R, Yin X, Liu Y, He M, Wu HL, Xie HN. Development and validation of a predictive model for fetal cerebral maturation using ultrasound for fetuses with normal growth and fetal growth restriction. Quant Imaging Med Surg 2023; 13:8435-8446. [PMID: 38106296 PMCID: PMC10722076 DOI: 10.21037/qims-23-786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/11/2023] [Indexed: 12/19/2023]
Abstract
Background Investigation of fetal cerebral maturation (FCM) is necessary and important to provide crucial prognostic information for normal and high-risk fetuses. The study aimed to develop a valid and quantitative predictive model for assessing FCM using ultrasound and validate the model for fetuses with normal and restricted growth. Methods This was a multicenter prospective observational study. Fetuses with normal growth recruited from a university teaching hospital (Center 1) and a municipal maternal unit (Center 2) were included in the training set and external validation set 1, respectively. The 124 growth-restricted fetuses enrolled in Center 1 were included in validation set 2. FCM was used to describe the gestational age (GA) in this study. The model was developed based on the sum of fetal cranial parameters (total fetal cranial parameters), including head circumference (HC) and depths of the insula (INS) and sylvian fissure (SF), parieto-occipital fissure (POF), and calcarine fissure (CF). A regression model, constructed based on total fetal cranial parameters and predicted GA, was established using the training set and validated using external validation set 1 and validation set 2. Results The intra- and interobserver intraclass correlation coefficients for HC, and depths of the INS and SF, POF, and CF were >0.90. An exponential regression equation was used to predict FCM: predicted GA of FCM (weeks) =11.16 × exp (0.003 × total fetal cranial parameters) (P<0.001; adjusted R2=0.973), standard error of estimate, 0.67 weeks. The standard error of the predicted GA of FCM from the model was ±4.7 days. In the validation set 1, the mean standard error of the developed prediction model for FCM was 0.97 weeks. The predictive model showed that FCM was significantly delayed in validation set 2 (2.10±1.31 weeks, P<0.001), considering the GA per the last menstrual period. Conclusions The predictive performance of the FCM model developed in this study was excellent, and the novel model may be a valuable investigative tool during clinical implementation.
Collapse
Affiliation(s)
- Ruan Peng
- Department of Ultrasonic Medicine, Fetal Medical Centre, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Xia Yin
- Department of Ultrasonic Medicine, Fetal Medical Centre, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yan Liu
- Department of Ultrasound, Dalian Municipal Women and Children’s Medical Center, Dalian, China
| | - Miao He
- Department of Ultrasonic Medicine, Fetal Medical Centre, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Hong-Li Wu
- Department of Ultrasonic Medicine, Fetal Medical Centre, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Hong-Ning Xie
- Department of Ultrasonic Medicine, Fetal Medical Centre, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
3
|
Al-Saleem MSM, Darwish HW, Naguib IA, Draz ME. Comparative Study of Augmented Classical Least Squares Models for UV Assay of Co-Formulated Antiemetics Together with Related Impurities. Molecules 2023; 28:7044. [PMID: 37894524 PMCID: PMC10609573 DOI: 10.3390/molecules28207044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 10/05/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
The classical least squares (CLS) model and three augmented CLS models are adopted and validated for the analysis of pyridoxine HCl (PYR), cyclizine HCl (CYC), and meclizine HCl (MEC) in a quinary mixture with two related impurities: the CYC main impurity, Benzhydrol (BEH), which has carcinogenic and hepatotoxic effects, and the MEC official impurity, 4-Chlorobenzophenone (BEP). The proposed augmented CLS models are orthogonal signal correction CLS (OSC-CLS), direct orthogonal signal correction CLS (DOSC-CLS), and net analyte processing CLS (NAP-CLS). These models were applied to quantify the three active constituents in their raw materials and their corresponding dosage forms using their UV spectra. To evaluate the CLS-based models sensibly, we design a comparative study involving two sets: the training set to construct models and the validation set to assess the prediction abilities of these models. A five-level, five-factor calibration design was established to produce 25 mixtures for the calibration set. In addition, 16 experiments were performed for a test set distributed equally between the in-space and out-space samples. The primary criterion for comparing the models' performance was the validation set's root mean square error of prediction (RMSEP) value. Finally, augmented CLS models showed acceptable results for assaying the three analytes. The results were compared statistically with the reported HPLC methods; however, the DOSC-CLS model proved the best for assaying the dosage forms.
Collapse
Affiliation(s)
- Muneera S. M. Al-Saleem
- Department of Chemistry, Science College, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Hany W. Darwish
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia;
| | - Ibrahim A. Naguib
- Department of Pharmaceutical Chemistry, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Mohammed E. Draz
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Delta University for Science and Technology, Gamasa 35712, Egypt;
| |
Collapse
|
4
|
Li Q, Chen Y, Pang Y, Kou L, Lu D, Ke W. An AAM-Based Identification Method for Ear Acupoint Area. Biomimetics (Basel) 2023; 8:307. [PMID: 37504195 PMCID: PMC10807013 DOI: 10.3390/biomimetics8030307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 06/28/2023] [Accepted: 07/04/2023] [Indexed: 07/29/2023] Open
Abstract
Ear image segmentation and identification is for the "observation" of TCM (traditional Chinese medicine), because disease diagnoses and treatment are achieved through the massaging of or pressing on some corresponding ear acupoints. With the image processing of ear image positioning and regional segmentation, the diagnosis and treatment of intelligent traditional Chinese medicine ear acupoints is improved. In order to popularize ear acupoint therapy, image processing technology has been adopted to detect the ear acupoint areas and help to gradually replace well-trained, experienced doctors. Due to the small area of the ear and the numerous ear acupoints, it is difficult to locate these acupoints based on traditional image recognition methods. An AAM (active appearance model)-based method for ear acupoint segmentation was proposed. The segmentation was illustrated as 91 feature points of a human ear image. In this process, the recognition effects of the ear acupoints, including the helix, antihelix, cymba conchae, cavum conchae, fossae helicis, fossae triangularis auriculae, tragus, antitragus, and earlobe, were divided precisely. Besides these, specially appointed acupoints or acupoint areas could be prominent in ear images. This method made it possible to partition and recognize the ear's acupoints through computer image processing, and maybe own the same abilities as experienced doctors for observation. The method was proved to be effective and accurate in experiments and can be used for the intelligent diagnosis of diseases.
Collapse
Affiliation(s)
- Qingfeng Li
- Health Management System Engineering Center, School of Public Health, Hangzhou Normal University, Hangzhou 311121, China;
| | - Yuhan Chen
- Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen 518055, China; (Y.C.); (Y.P.)
| | - Yijie Pang
- Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen 518055, China; (Y.C.); (Y.P.)
| | - Lei Kou
- Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao 266075, China;
| | - Dongxin Lu
- Health Management System Engineering Center, School of Public Health, Hangzhou Normal University, Hangzhou 311121, China;
| | - Wende Ke
- Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen 518055, China; (Y.C.); (Y.P.)
| |
Collapse
|
5
|
Terraillon J, Roeber FK, Flachenecker C, Frisch M. Training set designs for prediction of yield and moisture of maize test cross hybrids with unreplicated trials. Front Plant Sci 2023; 14:1080087. [PMID: 36950349 PMCID: PMC10025381 DOI: 10.3389/fpls.2023.1080087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Unreplicated field trials and genomic prediction are both used to enhance the efficiency in early selection stages of a hybrid maize breeding program. No results are available on the optimal experimental design when combining both approaches. Our objectives were to investigate the effect of the training set design on the accuracy of genomic prediction in unreplicated maize test crosses. We carried out a cross validation study on basis of an experimental data set consisting of 1436 hybrids evaluated for yield and moisture for which genotyping information of 461 SNP markers were available. Training set designs of different size, implementing within environment prediction, within year prediction, across year prediction, and combinations of data sources across years and environments were compared with respect to their prediction accuracy. Across year prediction did not reach prediction accuracies that are useful for genomic selection. Within year prediction across environments provided useful correlations between observed and predicted breeding values. The prediction accuracies did not improve when adding to the training set data from previous years. We conclude that using all data available from unreplicated tests of the current breeding cycle provides a good accuracy of predicting test crosses, whereas adding data from previous breeding cycles, in which the genotypes are less related to the tested material, has only limited value for increasing the prediction accuracy.
Collapse
Affiliation(s)
- Jérôme Terraillon
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Giessen, Germany
| | | | | | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Giessen, Germany
| |
Collapse
|
6
|
Ye Y, Zhang J, Song P, Qin P, Hu Y, An P, Li X, Lin Y, Wang J, Feng G. Clinical Features and Computed Tomography Radiomics-Based Model for Predicting Pancreatic Ductal Adenocarcinoma and Focal Mass-Forming Pancreatitis. Technol Cancer Res Treat 2023; 22:15330338231180792. [PMID: 37287274 DOI: 10.1177/15330338231180792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023] Open
Abstract
Objective: To establish a predictive model distinguishing focal mass-forming pancreatitis (FMFP) from pancreatic ductal adenocarcinoma (PDAC) based on computed tomography (CT) radiomics and clinical data. Methods: A total of 78 FMFP patients (FMFP group) and 120 PDAC patients (PDAC group) who were admitted to Xiangyang No.1 People's Hospital and Xiangyang Central Hospital from February 2012 to May 2021 and were pathologically diagnosed were included in this study, and were input to set up the training set and test set at a ratio of 7:3. The 3Dslicer software was used to extract the radiomic features and radiomic scores (Radscores) of the 2 groups, and the clinical data (age, gender, etc), CT imaging features (lesion location, size, enhancement degree, vascular wrapping, etc) and CT radiomic features of the 2 groups were compared. Logistic regression was used to screen the independent risk factors of the 2 groups, and multiple prediction models (clinical imaging model, radiomics model, and combined model) were established. Then the receiver operating characteristic (ROC) analysis and decision curve analysis (DCA) were conducted to compare the prediction performance and net benefit of the models. Results: The multivariate logistic regression results indicated that dilation of the main pancreatic duct, vascular wrapping, Radscore1 and Radscore2 were independent influencing factors for distinguishing FMFP from PDAC. In the training set, the combined model showed the best predictive performance (area under the ROC curve [AUC] 0.857, 95% CI [0.787-0.910]), significantly higher than the clinical imaging model (AUC 0.650, 95% CI [0.565-0.729]) and the radiomics model (AUC 0.812, 95% CI [0.759-0.890]). DCA confirmed that the combined model had the highest net benefit. These results were further validated by the test set. Conclusion: The combined model based on clinical-CT radiomics data can effectively identify FMFP and PDAC, providing a reference for clinical decision-making.
Collapse
Affiliation(s)
- Yingjian Ye
- Department of Radiology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Department of Infectious Disease and Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, China
- Department of Pharmacy and Laboratory, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| | - Junyan Zhang
- Department of Radiology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Depatment of Radiology, Hubei Clinical Research Center of Parkinson's Disease, Xiangyang Key Laboratory of Movement Disorders, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, Hubei Province, P.R. China
| | - Ping Song
- Department of Radiology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Department of Pharmacy and Laboratory, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| | - Ping Qin
- Department of Infectious Disease and Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, China
- Depatment of Radiology, Hubei Clinical Research Center of Parkinson's Disease, Xiangyang Key Laboratory of Movement Disorders, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, Hubei Province, P.R. China
| | - Yan Hu
- Department of Infectious Disease and Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, China
- Depatment of Radiology, Hubei Clinical Research Center of Parkinson's Disease, Xiangyang Key Laboratory of Movement Disorders, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, Hubei Province, P.R. China
| | - Peng An
- Department of Radiology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Department of Oncology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| | - Xiumei Li
- Department of Radiology, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Department of Internal Medicine, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| | - Yong Lin
- Department of Infectious Disease and Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, China
- Department of Internal Medicine, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| | - Jinsong Wang
- Department of Infectious Disease and Gastroenterology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, China
| | - Guoyan Feng
- Department of Pharmacy and Laboratory, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
- Department of Internal Medicine, Xiangyang No. 1 People's Hospital, Hubei University of Medicine, Xiangyang, China
| |
Collapse
|
7
|
Yu H, Luo S, Ji J, Wang Z, Zhi W, Mo N, Zhong P, He C, Wan T, Jin Y. A Deep-Learning-Based Artificial Intelligence System for the Pathology Diagnosis of Uterine Smooth Muscle Tumor. Life (Basel) 2022; 13. [PMID: 36675952 DOI: 10.3390/life13010003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/09/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
We aimed to develop an artificial intelligence (AI) diagnosis system for uterine smooth muscle tumors (UMTs) by using deep learning. We analyzed the morphological features of UMTs on whole-slide images (233, 108, and 30 digital slides of leiomyosarcomas, leiomyomas, and smooth muscle tumors of uncertain malignant potential stained with hematoxylin and eosin, respectively). Aperio ImageScope software randomly selected ≥10 areas of the total field of view. Pathologists randomly selected a marked region in each section that was no smaller than the total area of 10 high-power fields in which necrotic, vascular, collagenous, and mitotic areas were labeled. We constructed an automatic identification algorithm for cytological atypia and necrosis by using ResNet and constructed an automatic detection algorithm for mitosis by using YOLOv5. A logical evaluation algorithm was then designed to obtain an automatic UMT diagnostic aid that can "study and synthesize" a pathologist's experience. The precision, recall, and F1 index reached more than 0.920. The detection network could accurately detect the mitoses (0.913 precision, 0.893 recall). For the prediction ability, the AI system had a precision of 0.90. An AI-assisted system for diagnosing UMTs in routine practice scenarios is feasible and can improve the accuracy and efficiency of diagnosis.
Collapse
|
8
|
Gaur R, Prakash S, Kumar S, Abhishek K, Msahli M, Wahid A. A Machine-Learning-Blockchain-Based Authentication Using Smart Contracts for an IoHT System. Sensors (Basel) 2022; 22:9074. [PMID: 36501776 PMCID: PMC9741337 DOI: 10.3390/s22239074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 11/19/2022] [Accepted: 11/20/2022] [Indexed: 06/17/2023]
Abstract
Nowadays, finding genetic components and determining the likelihood that treatment would be helpful for patients are the key issues in the medical field. Medical data storage in a centralized system is complex. Data storage, on the other hand, has recently been distributed electronically in a cloud-based system, allowing access to the data at any time through a cloud server or blockchain-based ledger system. The blockchain is essential to managing safe and decentralized transactions in cryptography systems such as bitcoin and Ethereum. The blockchain stores information in different blocks, each of which has a set capacity. Data processing and storage are more effective and better for data management when blockchain and machine learning are integrated. Therefore, we have proposed a machine-learning-blockchain-based smart-contract system that improves security, reduces consumption, and can be trusted for real-time medical applications. The accuracy and computation performance of the IoHT system are safely improved by our system.
Collapse
Affiliation(s)
- Rajkumar Gaur
- ITCA, Madan Mohan Malaviya University of Technology Gorakhpur, Gorakhpur 273016, India
| | - Shiva Prakash
- ITCA, Madan Mohan Malaviya University of Technology Gorakhpur, Gorakhpur 273016, India
| | - Sanjay Kumar
- ITD, Rajkiya Engineering College Azamgarh, Deogaon 276201, India
| | - Kumar Abhishek
- CSED, National Institute of Technology Patna, Patna 800005, India
| | - Mounira Msahli
- Telecom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France
| | - Abdul Wahid
- Telecom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France
| |
Collapse
|
9
|
Parziale A, Capriolo G, Marcelli A. One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document. J Imaging 2020; 6:109. [PMID: 34460550 DOI: 10.3390/jimaging6100109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 09/29/2020] [Accepted: 10/06/2020] [Indexed: 11/17/2022] Open
Abstract
Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set.
Collapse
|
10
|
Liu Y, Liu B, Jin G, Zhang J, Wang X, Feng Y, Bian Z, Fei B, Yin Y, Huang Z. An Integrated Three-Long Non-coding RNA Signature Predicts Prognosis in Colorectal Cancer Patients. Front Oncol 2019; 9:1269. [PMID: 31824849 PMCID: PMC6883412 DOI: 10.3389/fonc.2019.01269] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 11/04/2019] [Indexed: 01/25/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most common cancers worldwide, whose morbidity and mortality gradually increased. Here, we aimed to identify and access prognostic long non-coding RNAs (lncRNAs) associated with overall survival (OS) in CRC. Firstly, RNA expression profiles were obtained from The Cancer Genome Atlas (TCGA) database, and 439 CRC patients were enrolled as a training set. Univariate Cox analysis and the least absolute shrinkage and selection operator analysis (LASSO) were performed to identify the prognostic lncRNAs. Multivariable Cox regression analysis was used to establish a prognostic risk formula including three lncRNAs (AP003555.2, AP006284.1, and LINC01602). The low-risk group had a better OS than the high-risk group (P < 0.0001), and the areas under the receiver operating characteristic curve (AUCs) of 3- and 5-year OS were 0.712 and 0.674, respectively. Then, we evaluated the signature in a clinical validation set which were collected from the Affiliated Hospital of Jiangnan University. Compared with the low-risk group, patients' OS were found to be significantly worse in the high-risk group (P = 0.0057). The AUCs of 3- and 5-year OS were 0.701 and 0.694, respectively. Finally, we constructed an lncRNA–microRNA (miRNA)–messenger RNA (mRNA) competing endogenous RNA (ceRNA) network to explore the potential function of three differentially expressed lncRNAs (DElncRNAs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated that these DElncRNAs were involved with several cancer-related pathways. In summary, our data provide evidence that the three-lncRNA signature could serve as an independent biomarker to predict prognosis in CRC. This study will also suggest that these three lncRNAs potentially participate in the progression of CRC.
Collapse
Affiliation(s)
- Yuhang Liu
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Bingxin Liu
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Guoying Jin
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Jia Zhang
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Xue Wang
- Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Yuyang Feng
- Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Zehua Bian
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Bojian Fei
- Department of Surgical Oncology, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Yuan Yin
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| | - Zhaohui Huang
- Wuxi Cancer Institute, Affiliated Hospital of Jiangnan University, Wuxi, China.,Laboratory of Cancer Epigenetics, Wuxi School of Medicine, Jiangnan University, Wuxi, China
| |
Collapse
|
11
|
Pogodin PV, Lagunin AA, Filimonov DA, Nicklaus MC, Poroikov VV. Improving (Q)SAR predictions by examining bias in the selection of compounds for experimental testing. SAR QSAR Environ Res 2019; 30:759-773. [PMID: 31547686 DOI: 10.1080/1062936x.2019.1665580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 09/05/2019] [Indexed: 06/10/2023]
Abstract
Existing data on structures and biological activities are limited and distributed unevenly across distinct molecular targets and chemical compounds. The question arises if these data represent an unbiased sample of the general population of chemical-biological interactions. To answer this question, we analyzed ChEMBL data for 87,583 molecules tested against 919 protein targets using supervised and unsupervised approaches. Hierarchical clustering of the Murcko frameworks generated using Chemistry Development Toolkit showed that the available data form a big diffuse cloud without apparent structure. In contrast hereto, PASS-based classifiers allowed prediction whether the compound had been tested against the particular molecular target, despite whether it was active or not. Thus, one may conclude that the selection of chemical compounds for testing against specific targets is biased, probably due to the influence of prior knowledge. We assessed the possibility to improve (Q)SAR predictions using this fact: PASS prediction of the interaction with the particular target for compounds predicted as tested against the target has significantly higher accuracy than for those predicted as untested (average ROC AUC are about 0.87 and 0.75, respectively). Thus, considering the existing bias in the data of the training set may increase the performance of virtual screening.
Collapse
Affiliation(s)
- P V Pogodin
- Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia
| | - A A Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia
- Department of Bioinformatics, Medical-Biological Department, Pirogov Russian National Research Medical University , Moscow , Russia
| | - D A Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia
| | - M C Nicklaus
- Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, NCI-Frederick , Frederick , MD , USA
| | - V V Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia
| |
Collapse
|
12
|
Kono TJY, Lei L, Shih CH, Hoffman PJ, Morrell PL, Fay JC. Comparative Genomics Approaches Accurately Predict Deleterious Variants in Plants. G3 (Bethesda) 2018; 8:3321-3329. [PMID: 30139765 PMCID: PMC6169392 DOI: 10.1534/g3.118.200563] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 08/10/2018] [Indexed: 12/11/2022]
Abstract
Recent advances in genome resequencing have led to increased interest in prediction of the functional consequences of genetic variants. Variants at phylogenetically conserved sites are of particular interest, because they are more likely than variants at phylogenetically variable sites to have deleterious effects on fitness and contribute to phenotypic variation. Numerous comparative genomic approaches have been developed to predict deleterious variants, but the approaches are nearly always assessed based on their ability to identify known disease-causing mutations in humans. Determining the accuracy of deleterious variant predictions in nonhuman species is important to understanding evolution, domestication, and potentially to improving crop quality and yield. To examine our ability to predict deleterious variants in plants we generated a curated database of 2,910 Arabidopsis thaliana mutants with known phenotypes. We evaluated seven approaches and found that while all performed well, their relative ranking differed from prior benchmarks in humans. We conclude that deleterious mutations can be reliably predicted in A. thaliana and likely other plant species, but that the relative performance of various approaches does not necessarily translate from one species to another.
Collapse
Affiliation(s)
- Thomas J Y Kono
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Li Lei
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Ching-Hua Shih
- Department of Genetics, Washington University, St. Louis, MO 63110
| | - Paul J Hoffman
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Peter L Morrell
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 551085
| | - Justin C Fay
- Department of Genetics, Washington University, St. Louis, MO 63110
| |
Collapse
|
13
|
Zhang Y, Liao Y, Wu X, Chen L, Xiong Q, Gao Z, Zheng X, Li G, Hou W. Non-Uniform Sample Assignment in Training Set Improving Recognition of Hand Gestures Dominated with Similar Muscle Activities. Front Neurorobot 2018; 12:3. [PMID: 29483866 PMCID: PMC5816264 DOI: 10.3389/fnbot.2018.00003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 01/18/2018] [Indexed: 11/22/2022] Open
Abstract
So far, little is known how the sample assignment of surface electromyogram (sEMG) features in training set influences the recognition efficiency of hand gesture, and the aim of this study is to explore the impact of different sample arrangements in training set on the classification of hand gestures dominated with similar muscle activation patterns. Seven right-handed healthy subjects (24.2 ± 1.2 years) were recruited to perform similar grasping tasks (fist, spherical, and cylindrical grasping) and similar pinch tasks (finger, key, and tape pinch). Each task was sustained for 4 s and followed by a 5-s rest interval to avoid fatigue, and the procedure was repeated 60 times for every task. sEMG were recorded from six forearm hand muscles during grasping or pinch tasks, and 4-s sEMG from each channel was segmented for empirical mode decomposition analysis trial by trial. The muscle activity was quantified with zero crossing (ZC) and Wilson amplitude (WAMP) of the first four resulting intrinsic mode function. Thereafter, a sEMG feature vector was constructed with the ZC and WAMP of each channel sEMG, and a classifier combined with support vector machine and genetic algorithm was used for hand gesture recognition. The sample number for each hand gesture was designed to be rearranged according to different sample proportion in training set, and corresponding recognition rate was calculated to evaluate the effect of sample assignment change on gesture classification. Either for similar grasping or pinch tasks, the sample assignment change in training set affected the overall recognition rate of candidate hand gesture. Compare to conventional results with uniformly assigned training samples, the recognition rate of similar pinch gestures was significantly improved when the sample of finger-, key-, and tape-pinch gesture were assigned as 60, 20, and 20%, respectively. Similarly, the recognition rate of similar grasping gestures also rose when the sample proportion of fist, spherical, and cylindrical grasping was 40, 30, and 30%, respectively. Our results suggested that the recognition rate of hand gestures can be regulated by change sample arrangement in training set, which can be potentially used to improve fine-gesture recognition for myoelectric robotic hand exoskeleton control.
Collapse
Affiliation(s)
- Yao Zhang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China
| | - Yanjian Liao
- Chongqing Engineering Research Center of Medical Electronics Technology, Chongqing, China
| | - Xiaoying Wu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China
| | - Lin Chen
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China
| | - Qiliang Xiong
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China
| | - Zhixian Gao
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China
| | - Xiaolin Zheng
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China.,Chongqing Engineering Research Center of Medical Electronics Technology, Chongqing, China
| | - Guanglin Li
- Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Wensheng Hou
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, China.,Chongqing Engineering Research Center of Medical Electronics Technology, Chongqing, China
| |
Collapse
|
14
|
Bustos-Korts D, Malosetti M, Chapman S, Biddulph B, van Eeuwijk F. Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space. G3 (Bethesda) 2016. [PMID: 27672112 DOI: 10.1534/g3.116.035410/-/dc1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 04/26/2023]
Abstract
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel.
Collapse
Affiliation(s)
- Daniela Bustos-Korts
- C.T. de Wit Graduate School for Production Ecology and Resource Conservation (PE&RC), Wageningen, The Netherlands
- Biometris, Wageningen University and Research, The Netherlands
| | | | - Scott Chapman
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) Agriculture, Queensland Bioscience Precinct, St. Lucia, Queensland 4067, Australia
| | - Ben Biddulph
- Department of Agriculture and Food, Western Australia, South Perth, Western Australia 6151, Australia
| | | |
Collapse
|
15
|
Bustos-Korts D, Malosetti M, Chapman S, Biddulph B, van Eeuwijk F. Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space. G3 (Bethesda) 2016; 6:3733-47. [PMID: 27672112 DOI: 10.1534/g3.116.035410] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel.
Collapse
|
16
|
Tayeh N, Klein A, Le Paslier MC, Jacquin F, Houtin H, Rond C, Chabert-Martinello M, Magnin-Robert JB, Marget P, Aubert G, Burstin J. Genomic Prediction in Pea: Effect of Marker Density and Training Population Size and Composition on Prediction Accuracy. Front Plant Sci 2015; 6:941. [PMID: 26635819 PMCID: PMC4648083 DOI: 10.3389/fpls.2015.00941] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 10/16/2015] [Indexed: 05/18/2023]
Abstract
Pea is an important food and feed crop and a valuable component of low-input farming systems. Improving resistance to biotic and abiotic stresses is a major breeding target to enhance yield potential and regularity. Genomic selection (GS) has lately emerged as a promising technique to increase the accuracy and gain of marker-based selection. It uses genome-wide molecular marker data to predict the breeding values of candidate lines to selection. A collection of 339 genetic resource accessions (CRB339) was subjected to high-density genotyping using the GenoPea 13.2K SNP Array. Genomic prediction accuracy was evaluated for thousand seed weight (TSW), the number of seeds per plant (NSeed), and the date of flowering (BegFlo). Mean cross-environment prediction accuracies reached 0.83 for TSW, 0.68 for NSeed, and 0.65 for BegFlo. For each trait, the statistical method, the marker density, and/or the training population size and composition used for prediction were varied to investigate their effects on prediction accuracy: the effect was large for the size and composition of the training population but limited for the statistical method and marker density. Maximizing the relatedness between individuals in the training and test sets, through the CDmean-based method, significantly improved prediction accuracies. A cross-population cross-validation experiment was further conducted using the CRB339 collection as a training population set and nine recombinant inbred lines populations as test set. Prediction quality was high with mean Q (2) of 0.44 for TSW and 0.59 for BegFlo. Results are discussed in the light of current efforts to develop GS strategies in pea.
Collapse
Affiliation(s)
- Nadim Tayeh
- INRA, UMR1347 AgroécologieDijon, France
- *Correspondence: Nadim Tayeh
| | | | - Marie-Christine Le Paslier
- INRA, US1279 Etude du Polymorphisme des Génomes Végétaux, CEA-IG/Centre National de GénotypageEvry, France
| | | | | | | | | | | | | | | | | |
Collapse
|