151
|
Boldizsár Á, Soltész A, Tanino K, Kalapos B, Marozsán-Tóth Z, Monostori I, Dobrev P, Vankova R, Galiba G. Elucidation of molecular and hormonal background of early growth cessation and endodormancy induction in two contrasting Populus hybrid cultivars. BMC PLANT BIOLOGY 2021; 21:111. [PMID: 33627081 PMCID: PMC7905644 DOI: 10.1186/s12870-021-02828-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 01/06/2021] [Indexed: 06/02/2023]
Abstract
BACKGROUND Over the life cycle of perennial trees, the dormant state enables the avoidance of abiotic stress conditions. The growth cycle can be partitioned into induction, maintenance and release and is controlled by complex interactions between many endogenous and environmental factors. While phytohormones have long been linked with dormancy, there is increasing evidence of regulation by DAM and CBF genes. To reveal whether the expression kinetics of CBFs and their target PtDAM1 is related to growth cessation and endodormancy induction in Populus, two hybrid poplar cultivars were studied which had known differential responses to dormancy inducing conditions. RESULTS Growth cessation, dormancy status and expression of six PtCBFs and PtDAM1 were analyzed. The 'Okanese' hybrid cultivar ceased growth rapidly, was able to reach endodormancy, and exhibited a significant increase of several PtCBF transcripts in the buds on the 10th day. The 'Walker' cultivar had delayed growth cessation, was unable to enter endodormancy, and showed much lower CBF expression in buds. Expression of PtDAM1 peaked on the 10th day only in the buds of 'Okanese'. In addition, PtDAM1 was not expressed in the leaves of either cultivar while leaf CBFs expression pattern was several fold higher in 'Walker', peaking at day 1. Leaf phytohormones in both cultivars followed similar profiles during growth cessation but differentiated based on cytokinins which were largely reduced, while the Ox-IAA and iP7G increased in 'Okanese' compared to 'Walker'. Surprisingly, ABA concentration was reduced in leaves of both cultivars. However, the metabolic deactivation product of ABA, phaseic acid, exhibited an early peak on the first day in 'Okanese'. CONCLUSIONS Our results indicate that PtCBFs and PtDAM1 have differential kinetics and spatial localization which may be related to early growth cessation and endodormancy induction under the regime of low night temperature and short photoperiod in poplar. Unlike buds, PtCBFs and PtDAM1 expression levels in leaves were not associated with early growth cessation and dormancy induction under these conditions. Our study provides new evidence that the degradation of auxin and cytokinins in leaves may be an important regulatory point in a CBF-DAM induced endodormancy. Further investigation of other PtDAMs in bud tissue and a study of both growth-inhibiting and the degradation of growth-promoting phytohormones is warranted.
Collapse
Affiliation(s)
- Ákos Boldizsár
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
| | - Alexandra Soltész
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
| | - Karen Tanino
- Department of Plant Sciences, College of Agriculture and Bioresources, University of Saskatchewan, Saskatoon, SK S7N 5A8 Canada
| | - Balázs Kalapos
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
| | - Zsuzsa Marozsán-Tóth
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
| | - István Monostori
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
| | - Petre Dobrev
- Laboratory of Hormonal Regulations in Plants, Institute of Experimental Botany of the Czech Academy of Sciences, Prague, 165 02 Czech Republic
| | - Radomira Vankova
- Laboratory of Hormonal Regulations in Plants, Institute of Experimental Botany of the Czech Academy of Sciences, Prague, 165 02 Czech Republic
| | - Gábor Galiba
- Department of Plant Molecular Biology, Agricultural Institute, Centre for Agricultural Research, ELKH, Martonvásár, H-2462 Hungary
- Festetics Doctoral School, Georgikon Campus, Szent István University, Keszthely, H-8360 Hungary
| |
Collapse
|
152
|
Race AM, Sutton D, Hamm G, Maglennon G, Morton JP, Strittmatter N, Campbell A, Sansom OJ, Wang Y, Barry ST, Takáts Z, Goodwin RJA, Bunch J. Deep Learning-Based Annotation Transfer between Molecular Imaging Modalities: An Automated Workflow for Multimodal Data Integration. Anal Chem 2021; 93:3061-3071. [PMID: 33534548 DOI: 10.1021/acs.analchem.0c02726] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
An ever-increasing array of imaging technologies are being used in the study of complex biological samples, each of which provides complementary, occasionally overlapping information at different length scales and spatial resolutions. It is important to understand the information provided by one technique in the context of the other to achieve a more holistic overview of such complex samples. One way to achieve this is to use annotations from one modality to investigate additional modalities. For microscopy-based techniques, these annotations could be manually generated using digital pathology software or automatically generated by machine learning (including deep learning) methods. Here, we present a generic method for using annotations from one microscopy modality to extract information from complementary modalities. We also present a fast, general, multimodal registration workflow [evaluated on multiple mass spectrometry imaging (MSI) modalities, matrix-assisted laser desorption/ionization, desorption electrospray ionization, and rapid evaporative ionization mass spectrometry] for automatic alignment of complex data sets, demonstrating an order of magnitude speed-up compared to previously published work. To demonstrate the power of the annotation transfer and multimodal registration workflows, we combine MSI, histological staining (such as hematoxylin and eosin), and deep learning (automatic annotation of histology images) to investigate a pancreatic cancer mouse model. Neoplastic pancreatic tissue regions, which were histologically indistinguishable from one another, were observed to be metabolically different. We demonstrate the use of the proposed methods to better understand tumor heterogeneity and the tumor microenvironment by transferring machine learning results freely between the two modalities.
Collapse
Affiliation(s)
- Alan M Race
- Imaging and AI, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Daniel Sutton
- Imaging and AI, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Gregory Hamm
- Imaging and AI, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Gareth Maglennon
- Oncology Safety, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Jennifer P Morton
- Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, U.K
- Institute of Cancer Sciences, University of Glasgow, Garscube Estate, Switchback Road, Glasgow G61 1QH, U.K
| | - Nicole Strittmatter
- Imaging and AI, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Andrew Campbell
- Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, U.K
| | - Owen J Sansom
- Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, U.K
- Institute of Cancer Sciences, University of Glasgow, Garscube Estate, Switchback Road, Glasgow G61 1QH, U.K
| | - Yinhai Wang
- Discovery Sciences, R&D, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Simon T Barry
- Bioscience, Early Oncology, AstraZeneca, Cambridge CB4 0WG, U.K
| | - Zoltan Takáts
- Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, U.K
| | - Richard J A Goodwin
- Imaging and AI, Clinical Pharmacology and Safety Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB4 0WG, U.K
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, U.K
| | - Josephine Bunch
- Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, U.K
- National Centre of Excellence in Mass Spectrometry Imaging (NiCE-MSI), National Physical Laboratory, Teddington TW11 0LW, U.K
| |
Collapse
|
153
|
Efficient Prediction of In Vitro Piroxicam Release and Diffusion From Topical Films Based on Biopolymers Using Deep Learning Models and Generative Adversarial Networks. J Pharm Sci 2021; 110:2531-2543. [PMID: 33548245 DOI: 10.1016/j.xphs.2021.01.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 12/12/2022]
Abstract
The purpose of this study was to simultaneously predict the drug release and skin permeation of Piroxicam (PX) topical films based on Chitosan (CTS), Xanthan gum (XG) and its Carboxymethyl derivatives (CMXs) as matrix systems. These films were prepared by the solvent casting method, using Tween 80 (T80) as a permeation enhancer. All of the prepared films were assessed for their physicochemical parameters, their in vitro drug release and ex vivo skin permeation studies. Moreover, deep learning models and machine learning models were applied to predict the drug release and permeation rates. The results indicated that all of the films exhibited good consistency and physicochemical properties. Furthermore, it was noticed that when T80 was used in the optimal formulation (F8) based on CTS-CMX3, a satisfactory drug release pattern was found where 99.97% of PX was released and an amount of 1.18 mg/cm2 was permeated after 48 h. Moreover, Generative Adversarial Network (GAN) efficiently enhanced the performance of deep learning models and DNN was chosen as the best predictive approach with MSE values equal to 0.00098 and 0.00182 for the drug release and permeation kinetics, respectively. DNN precisely predicted PX dissolution profiles with f2 values equal to 99.99 for all the formulations.
Collapse
|
154
|
Kades K, Sellner J, Koehler G, Full PM, Lai TYE, Kleesiek J, Maier-Hein KH. Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study. JMIR Med Inform 2021; 9:e22795. [PMID: 33533728 PMCID: PMC7889424 DOI: 10.2196/22795] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 12/03/2020] [Accepted: 12/22/2020] [Indexed: 11/30/2022] Open
Abstract
Background Natural Language Understanding enables automatic extraction of relevant information from clinical text data, which are acquired every day in hospitals. In 2018, the language model Bidirectional Encoder Representations from Transformers (BERT) was introduced, generating new state-of-the-art results on several downstream tasks. The National NLP Clinical Challenges (n2c2) is an initiative that strives to tackle such downstream tasks on domain-specific clinical data. In this paper, we present the results of our participation in the 2019 n2c2 and related work completed thereafter. Objective The objective of this study was to optimally leverage BERT for the task of assessing the semantic textual similarity of clinical text data. Methods We used BERT as an initial baseline and analyzed the results, which we used as a starting point to develop 3 different approaches where we (1) added additional, handcrafted sentence similarity features to the classifier token of BERT and combined the results with more features in multiple regression estimators, (2) incorporated a built-in ensembling method, M-Heads, into BERT by duplicating the regression head and applying an adapted training strategy to facilitate the focus of the heads on different input patterns of the medical sentences, and (3) developed a graph-based similarity approach for medications, which allows extrapolating similarities across known entities from the training set. The approaches were evaluated with the Pearson correlation coefficient between the predicted scores and ground truth of the official training and test dataset. Results We improved the performance of BERT on the test dataset from a Pearson correlation coefficient of 0.859 to 0.883 using a combination of the M-Heads method and the graph-based similarity approach. We also show differences between the test and training dataset and how the two datasets influenced the results. Conclusions We found that using a graph-based similarity approach has the potential to extrapolate domain specific knowledge to unseen sentences. We observed that it is easily possible to obtain deceptive results from the test dataset, especially when the distribution of the data samples is different between training and test datasets.
Collapse
Affiliation(s)
- Klaus Kades
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Partner Site Heidelberg, German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Jan Sellner
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
| | - Gregor Koehler
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Peter M Full
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Heidelberg University, Heidelberg, Germany
| | - T Y Emmy Lai
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Hochschule Mannheim, University of Applied Sciences, Mannheim, Germany
| | - Jens Kleesiek
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Partner Site Heidelberg, German Cancer Consortium (DKTK), Heidelberg, Germany.,Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany.,Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, Essen, Germany
| | - Klaus H Maier-Hein
- German Cancer Research Center (DKFZ), Heidelberg, Germany.,Partner Site Heidelberg, German Cancer Consortium (DKTK), Heidelberg, Germany.,Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany.,Heidelberg University, Heidelberg, Germany
| |
Collapse
|
155
|
Soltan AAS, Kouchaki S, Zhu T, Kiyasseh D, Taylor T, Hussain ZB, Peto T, Brent AJ, Eyre DW, Clifton DA. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test. Lancet Digit Health 2021; 3:e78-e87. [PMID: 33509388 PMCID: PMC7831998 DOI: 10.1016/s2589-7500(20)30274-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/20/2020] [Accepted: 11/10/2020] [Indexed: 01/19/2023]
Abstract
BACKGROUND The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. METHODS We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. FINDINGS We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). INTERPRETATION Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. FUNDING Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.
Collapse
Affiliation(s)
- Andrew A S Soltan
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, UK.
| | - Samaneh Kouchaki
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK; Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, UK
| | - Tingting Zhu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Dani Kiyasseh
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Thomas Taylor
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Zaamin B Hussain
- Harvard Graduate School of Education and Harvard T H Chan School of Public Health, Harvard University, Boston MA, USA
| | - Tim Peto
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Nuffield Department of Medicine, University of Oxford, Oxford, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford and Public Health England, Oxford, UK
| | - Andrew J Brent
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - David W Eyre
- John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford and Public Health England, Oxford, UK
| | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| |
Collapse
|
156
|
Insight on the Genetics of Atrial Fibrillation in Puerto Rican Hispanics. Stroke Res Treat 2021; 2021:8819896. [PMID: 33505650 PMCID: PMC7810540 DOI: 10.1155/2021/8819896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 11/25/2020] [Indexed: 11/17/2022] Open
Abstract
Non-Hispanic whites present with higher atrial fibrillation (AF) prevalence than other racial minorities living in the mainland USA. In two hospital-based studies, Puerto Rican Hispanics had a lower prevalence of atrial fibrillation of 2.5% than non-Hispanic Whites with 5.7%. This data is particularly controversial because Hispanics possess a higher prevalence of traditional risk factors for developing AF yet have a lower AF prevalence. This phenomenon is known as the atrial fibrillation paradox. Despite recent advancements in understanding AF, its pathogenesis remains unclear. In this study, we compared a genetic dataset of Puerto Rican Hispanics to 111 SNP known to be associated with AF in a large European cohort and determine if they are associated with AF susceptibility in our cohort. To achieve this aim, we performed a secondary analysis of existing data using the following two studies: (1) The Pharmacogenetics of Warfarin in Puerto Ricans study and the (2) A Genomic Approach for Clopidogrel in Caribbean Hispanics, and assess for the presence of European SNPs associated with AF from the genome-wide association study of 1 million people identifies 111 loci for atrial fibrillation. We used data from 555 cardiovascular Puerto Rican Hispanic patients, consisting of 486 control and 69 cases. We found that the following SNPs showed significant association with AF in PHR: rs2834618, rs6462079, rs7508, rs2040862, and rs10458660. Some of these SNPs are proteins involved in lysosomal activities responsible for breaking ceramides to sphingosines and collagen deposition around atrial cardiomyocytes. Furthermore, we performed a machine learning analysis and determined that Native American admixture and heart failure were strongly predictive of AF in PHR. For the first time, this study provides some genetic insight into AF's mechanisms in a Puerto Rican Hispanic cohort.
Collapse
|
157
|
MutagenPred-GCNNs: A Graph Convolutional Neural Network-Based Classification Model for Mutagenicity Prediction with Data-Driven Molecular Fingerprints. Interdiscip Sci 2021; 13:25-33. [PMID: 33506363 DOI: 10.1007/s12539-020-00407-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 11/24/2020] [Accepted: 12/03/2020] [Indexed: 10/22/2022]
Abstract
An important task in the early stage of drug discovery is the identification of mutagenic compounds. Mutagenicity prediction models that can interpret relationships between toxicological endpoints and compound structures are especially favorable. In this research, we used an advanced graph convolutional neural network (GCNN) architecture to identify the molecular representation and develop predictive models based on these representations. The predictive model based on features extracted by GCNNs can not only predict the mutagenicity of compounds but also identify the structure alerts in compounds. In fivefold cross-validation and external validation, the highest area under the curve was 0.8782 and 0.8382, respectively; the highest accuracy (Q) was 80.98% and 76.63%, respectively; the highest sensitivity was 83.27% and 78.92%, respectively; and the highest specificity was 78.83% and 76.32%, respectively. Additionally, our model also identified some toxicophores, such as aromatic nitro, three-membered heterocycles, quinones, and nitrogen and sulfur mustard. These results indicate that GCNNs could learn the features of mutagens effectively. In summary, we developed a mutagenicity classification model with high predictive performance and interpretability based on a data-driven molecular representation trained through GCNNs.
Collapse
|
158
|
Wang H, Cui W, Guo Y, Du Y, Zhou Y. Machine Learning Prediction of Foodborne Disease Pathogens: Algorithm Development and Validation Study. JMIR Med Inform 2021; 9:e24924. [PMID: 33496675 PMCID: PMC7872834 DOI: 10.2196/24924] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/18/2020] [Accepted: 12/28/2020] [Indexed: 01/18/2023] Open
Abstract
Background Foodborne diseases, as a type of disease with a high global incidence, place a heavy burden on public health and social economy. Foodborne pathogens, as the main factor of foodborne diseases, play an important role in the treatment and prevention of foodborne diseases; however, foodborne diseases caused by different pathogens lack specificity in clinical features, and there is a low proportion of clinically actual pathogen detection in real life. Objective We aimed to analyze foodborne disease case data, select appropriate features based on analysis results, and use machine learning methods to classify foodborne disease pathogens to predict foodborne disease pathogens that have not been tested. Methods We extracted features such as space, time, and exposed food from foodborne disease case data and analyzed the relationship between these features and the foodborne disease pathogens using a variety of machine learning methods to classify foodborne disease pathogens. We compared the results of 4 models to obtain the pathogen prediction model with the highest accuracy. Results The gradient boost decision tree model obtained the highest accuracy, with accuracy approaching 69% in identifying 4 pathogens including Salmonella, Norovirus, Escherichia coli, and Vibrio parahaemolyticus. By evaluating the importance of features such as time of illness, geographical longitude and latitude, and diarrhea frequency, we found that they play important roles in classifying the foodborne disease pathogens. Conclusions Data analysis can reflect the distribution of some features of foodborne diseases and the relationship among the features. The classification of pathogens based on the analysis results and machine learning methods can provide beneficial support for clinical auxiliary diagnosis and treatment of foodborne diseases.
Collapse
Affiliation(s)
- Hanxue Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Chinese Academy of Sciences University, Beijing, China
| | - Wenjuan Cui
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yunchang Guo
- China National Center for Food Safety Risk Assessment, Beijing, China
| | - Yi Du
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Chinese Academy of Sciences University, Beijing, China
| | - Yuanchun Zhou
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Chinese Academy of Sciences University, Beijing, China
| |
Collapse
|
159
|
Margulis E, Dagan-Wiener A, Ives RS, Jaffari S, Siems K, Niv MY. Intense bitterness of molecules: Machine learning for expediting drug discovery. Comput Struct Biotechnol J 2020; 19:568-576. [PMID: 33510862 PMCID: PMC7807207 DOI: 10.1016/j.csbj.2020.12.030] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/17/2020] [Accepted: 12/20/2020] [Indexed: 12/16/2022] Open
Abstract
Drug development is a long, expensive and multistage process geared to achieving safe drugs with high efficacy. A crucial prerequisite for completing the medication regimen for oral drugs, particularly for pediatric and geriatric populations, is achieving taste that does not hinder compliance. Currently, the aversive taste of drugs is tested in late stages of clinical trials. This can result in the need to reformulate, potentially resulting in the use of more animals for additional toxicity trials, increased financial costs and a delay in release to the market. Here we present BitterIntense, a machine learning tool that classifies molecules into "very bitter" or "not very bitter", based on their chemical structure. The model, trained on chemically diverse compounds, has above 80% accuracy on several test sets. Our results suggest that about 25% of drugs are predicted to be very bitter, with even higher prevalence (~40%) in COVID19 drug candidates and in microbial natural products. Only ~10% of toxic molecules are predicted to be intensely bitter, and it is also suggested that intense bitterness does not correlate with hepatotoxicity of drugs. However, very bitter compounds may be more cardiotoxic than not very bitter compounds, possessing significantly lower QPlogHERG values. BitterIntense allows quick and easy prediction of strong bitterness of compounds of interest for food, pharma and biotechnology industries. We estimate that implementation of BitterIntense or similar tools early in drug discovery process may lead to reduction in delays, in animal use and in overall financial burden.
Collapse
Affiliation(s)
- Eitan Margulis
- The Institute of Biochemistry, Food Science and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Ayana Dagan-Wiener
- The Institute of Biochemistry, Food Science and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Robert S. Ives
- Comparative & Translational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage SG1 2NY, United Kingdom
| | - Sara Jaffari
- Product Development & Supply, GlaxoSmithKline, Park Road, Ware, SG12 0DP, United Kingdom
| | | | - Masha Y. Niv
- The Institute of Biochemistry, Food Science and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
160
|
Predicting the need for intubation in the first 24 h after critical care admission using machine learning approaches. Sci Rep 2020; 10:20931. [PMID: 33262391 PMCID: PMC7708470 DOI: 10.1038/s41598-020-77893-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022] Open
Abstract
Early and accurate prediction of the need for intubation may provide more time for preparation and increase safety margins by avoiding high risk late intubation. This study evaluates whether machine learning can predict the need for intubation within 24 h using commonly available bedside and laboratory parameters taken at critical care admission. We extracted data from 2 large critical care databases (MIMIC-III and eICU-CRD). Missing variables were imputed using autoencoder. Machine learning classifiers using logistic regression and random forest were trained using 60% of the data and tested using the remaining 40% of the data. We compared the performance of logistic regression and random forest models to predict intubation in critically ill patients. After excluding patients with limitations of therapy and missing data, we included 17,616 critically ill patients in this retrospective cohort. Within 24 h of admission, 2,292 patients required intubation, whilst 15,324 patients were not intubated. Blood gas parameters (PaO2, PaCO2, HCO3−), Glasgow Coma Score, respiratory variables (respiratory rate, SpO2), temperature, age, and oxygen therapy were used to predict intubation. Random forest had AUC 0.86 (95% CI 0.85–0.87) and logistic regression had AUC 0.77 (95% CI 0.76–0.78) for intubation prediction performance. Random forest model had sensitivity of 0.88 (95% CI 0.86–0.90) and specificity of 0.66 (95% CI 0.63–0.69), with good calibration throughout the range of intubation risks. The results showed that machine learning could predict the need for intubation in critically ill patients using commonly collected bedside clinical parameters and laboratory results. It may be used in real-time to help clinicians predict the need for intubation within 24 h of intensive care unit admission.
Collapse
|
161
|
A Machine Learning Approach to Model Interdependencies between Dynamic Response and Crack Propagation. SENSORS 2020; 20:s20236847. [PMID: 33266048 PMCID: PMC7730809 DOI: 10.3390/s20236847] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 11/22/2020] [Accepted: 11/26/2020] [Indexed: 11/26/2022]
Abstract
Accurate damage detection in engineering structures is a critical part of structural health monitoring. A variety of non-destructive inspection methods has been employed to detect the presence and severity of the damage. In this research, machine learning (ML) algorithms are used to assess the dynamic response of the system. It can predict the damage severity, damage location, and fundamental behaviour of the system. Fatigue damage data of aluminium and ABS under coupled mechanical loads at different temperatures are used to train the model. The model shows that natural frequency and temperature appear to be the most important predictive features for aluminium. It appears to be dominated by natural frequency and tip amplitude for ABS. The results also show that the position of the crack along the specimen appears to be of little importance for either material, allowing simultaneous prediction of location and damage severity.
Collapse
|
162
|
Jeong SH, Lee TR, Kang JB, Choi MT. Analysis of Health Insurance Big Data for Early Detection of Disabilities: Algorithm Development and Validation. JMIR Med Inform 2020; 8:e19679. [PMID: 33226352 PMCID: PMC7721549 DOI: 10.2196/19679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/27/2020] [Accepted: 10/30/2020] [Indexed: 11/25/2022] Open
Abstract
Background Early detection of childhood developmental delays is very important for the treatment of disabilities. Objective To investigate the possibility of detecting childhood developmental delays leading to disabilities before clinical registration by analyzing big data from a health insurance database. Methods In this study, the data from children, individuals aged up to 13 years (n=2412), from the Sample Cohort 2.0 DB of the Korea National Health Insurance Service were organized by age range. Using 6 categories (having no disability, having a physical disability, having a brain lesion, having a visual impairment, having a hearing impairment, and having other conditions), features were selected in the order of importance with a tree-based model. We used multiple classification algorithms to find the best model for each age range. The earliest age range with clinically significant performance showed the age at which conditions can be detected early. Results The disability detection model showed that it was possible to detect disabilities with significant accuracy even at the age of 4 years, about a year earlier than the mean diagnostic age of 4.99 years. Conclusions Using big data analysis, we discovered the possibility of detecting disabilities earlier than clinical diagnoses, which would allow us to take appropriate action to prevent disabilities.
Collapse
Affiliation(s)
| | - Tae Rim Lee
- Sungkyunkwan University, Suwon, Republic of Korea
| | - Jung Bae Kang
- Korea Disabled People's Development Institute, Seoul, Republic of Korea
| | | |
Collapse
|
163
|
Application of Gated Recurrent Unit (GRU) Neural Network for Smart Batch Production Prediction. ENERGIES 2020. [DOI: 10.3390/en13226121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Production prediction plays an important role in decision making, development planning, and economic evaluation during the exploration and development period. However, applying traditional methods for production forecasting of newly developed wells in the conglomerate reservoir is restricted by limited historical data, complex fracture propagation, and frequent operational changes. This study proposed a Gated Recurrent Unit (GRU) neural network-based model to achieve batch production forecasting in M conglomerate reservoir of China, which tackles the limitations of traditional decline curve analysis and conventional time-series prediction methods. The model is trained by four features of production rate, tubing pressure (TP), choke size (CS), and shut-in period (SI) from 70 multistage hydraulic fractured horizontal wells. Firstly, a comprehensive data preprocessing is implemented, including excluding unfit wells, data screening, feature selection, partitioning data set, z-score normalization, and format conversion. Then, the four-feature model is compared with the model considering production only, and it is found that with frequent oilfield operations changes, the four-feature model could accurately capture the complex variance pattern of production rate. Further, Random Forest (RF) is employed to optimize the prediction results of GRU. For a fair evaluation, the performance of the proposed model is compared with that of simple Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) neural network. The results show that the proposed approach outperforms the others in prediction accuracy and generalization ability. It is worth mentioning that under the guidance of continuous learning, the GRU model can be updated as soon as more wells become available.
Collapse
|
164
|
Homan CM, Schrading JN, Ptucha RW, Cerulli C, Ovesdotter Alm C. Quantitative Methods for Analyzing Intimate Partner Violence in Microblogs: Observational Study. J Med Internet Res 2020; 22:e15347. [PMID: 33211021 PMCID: PMC7714648 DOI: 10.2196/15347] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 12/20/2019] [Accepted: 07/26/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Social media is a rich, virtually untapped source of data on the dynamics of intimate partner violence, one that is both global in scale and intimate in detail. OBJECTIVE The aim of this study is to use machine learning and other computational methods to analyze social media data for the reasons victims give for staying in or leaving abusive relationships. METHODS Human annotation, part-of-speech tagging, and machine learning predictive models, including support vector machines, were used on a Twitter data set of 8767 #WhyIStayed and #WhyILeft tweets each. RESULTS Our methods explored whether we can analyze micronarratives that include details about victims, abusers, and other stakeholders, the actions that constitute abuse, and how the stakeholders respond. CONCLUSIONS Our findings are consistent across various machine learning methods, which correspond to observations in the clinical literature, and affirm the relevance of natural language processing and machine learning for exploring issues of societal importance in social media.
Collapse
Affiliation(s)
| | | | | | - Catherine Cerulli
- University of Rochester Medical Center, Rochester, NY, United States
| | | |
Collapse
|
165
|
DiSilvestro KJ, Veeramani A, McDonald CL, Zhang AS, Kuris EO, Durand WM, Cohen EM, Daniels AH. Predicting Postoperative Mortality After Metastatic Intraspinal Neoplasm Excision: Development of a Machine-Learning Approach. World Neurosurg 2020; 146:e917-e924. [PMID: 33212282 DOI: 10.1016/j.wneu.2020.11.037] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 11/06/2020] [Accepted: 11/07/2020] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Mortality following surgical resection of spinal tumors is a devastating outcome. Naïve Bayes machine learning algorithms may be leveraged in surgical planning to predict mortality. In this investigation, we use a Naïve Bayes classification algorithm to predict mortality following spinal tumor excision within 30 days of surgery. METHODS Patients who underwent laminectomies between 2006 and 2018 for excisions of intraspinal neoplasms were selected from the National Surgical Quality Initiative Program. Naïve Bayes classifier analysis was conducted in Python. The area under the receiver operating curve (AUC) was calculated to evaluate the classifier's ability to predict mortality within 30 days of surgery. Multivariable logistic regression analysis was performed in R to identify risk factors for 30-day postoperative mortality. RESULTS In total, 2094 spine tumor surgery patients were included in the study. The 30-day mortality rate was 5.16%. The classifier yielded an AUC of 0.898, which exceeds the predictive capacity of the National Surgical Quality Initiative Program mortality probability calculator's AUC of 0.722 (P < 0.0001). The multivariable regression indicated that smoking history, chronic obstructive pulmonary disease, disseminated cancer, bleeding disorder history, dyspnea, and low albumin levels were strongly associated with 30-day mortality. CONCLUSIONS The Naïve Bayes classifier may be used to predict 30-day mortality for patients undergoing spine tumor excisions, with an increasing degree of accuracy as the model better performs by learning continuously from the input patient data. Patient outcomes can be improved by identifying high-risk populations early using the algorithm and applying that data to inform preoperative decision making, as well as patient selection and education.
Collapse
Affiliation(s)
- Kevin J DiSilvestro
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Ashwin Veeramani
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, USA
| | - Christopher L McDonald
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Andrew S Zhang
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Eren O Kuris
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Wesley M Durand
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Eric M Cohen
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA
| | - Alan H Daniels
- Department of Orthopedic Surgery, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, Rhode Island, USA.
| |
Collapse
|
166
|
Shehzad A, Rockwood K, Stanley J, Dunn T, Howlett SE. Use of Patient-Reported Symptoms from an Online Symptom Tracking Tool for Dementia Severity Staging: Development and Validation of a Machine Learning Approach. J Med Internet Res 2020; 22:e20840. [PMID: 33174853 PMCID: PMC7688393 DOI: 10.2196/20840] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 08/17/2020] [Accepted: 10/24/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND SymptomGuide Dementia (DGI Clinical Inc) is a publicly available online symptom tracking tool to support caregivers of persons living with dementia. The value of such data are enhanced when the specific dementia stage is identified. OBJECTIVE We aimed to develop a supervised machine learning algorithm to classify dementia stages based on tracked symptoms. METHODS We employed clinical data from 717 people from 3 sources: (1) a memory clinic; (2) long-term care; and (3) an open-label trial of donepezil in vascular and mixed dementia (VASPECT). Symptoms were captured with SymptomGuide Dementia. A clinician classified participants into 4 groups using either the Functional Assessment Staging Test or the Global Deterioration Scale as mild cognitive impairment, mild dementia, moderate dementia, or severe dementia. Individualized symptom profiles from the pooled data were used to train machine learning models to predict dementia severity. Models trained with 6 different machine learning algorithms were compared using nested cross-validation to identify the best performing model. Model performance was assessed using measures of balanced accuracy, precision, recall, Cohen κ, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). The best performing algorithm was used to train a model optimized for balanced accuracy. RESULTS The study population was mostly female (424/717, 59.1%), older adults (mean 77.3 years, SD 10.6, range 40-100) with mild to moderate dementia (332/717, 46.3%). Age, duration of symptoms, 37 unique dementia symptoms, and 10 symptom-derived variables were used to distinguish dementia stages. A model trained with a support vector machine learning algorithm using a one-versus-rest approach showed the best performance. The correct dementia stage was identified with 83% balanced accuracy (Cohen κ=0.81, AUPRC 0.91, AUROC 0.96). The best performance was seen when classifying severe dementia (AUROC 0.99). CONCLUSIONS A supervised machine learning algorithm exhibited excellent performance in identifying dementia stages based on dementia symptoms reported in an online environment. This novel dementia staging algorithm can be used to describe dementia stage based on user-reported symptoms. This type of symptom recording offers real-world data that reflect important symptoms in people with dementia.
Collapse
Affiliation(s)
| | - Kenneth Rockwood
- DGI Clinical Inc, Halifax, NS, Canada.,Geriatric Medicine Research Unit, Nova Scotia Health Authority, Halifax, NS, Canada.,Division of Geriatric Medicine, Dalhousie University, Halifax, NS, Canada
| | | | | | - Susan E Howlett
- DGI Clinical Inc, Halifax, NS, Canada.,Division of Geriatric Medicine, Dalhousie University, Halifax, NS, Canada.,Department of Pharmacology, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|
167
|
Multi-Temporal Predictive Modelling of Sorghum Biomass Using UAV-Based Hyperspectral and LiDAR Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12213587] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
High-throughput phenotyping using high spatial, spectral, and temporal resolution remote sensing (RS) data has become a critical part of the plant breeding chain focused on reducing the time and cost of the selection process for the “best” genotypes with respect to the trait(s) of interest. In this paper, the potential of accurate and reliable sorghum biomass prediction using visible and near infrared (VNIR) and short-wave infrared (SWIR) hyperspectral data as well as light detection and ranging (LiDAR) data acquired by sensors mounted on UAV platforms is investigated. Predictive models are developed using classical regression-based machine learning methods for nine experiments conducted during the 2017 and 2018 growing seasons at the Agronomy Center for Research and Education (ACRE) at Purdue University, Indiana, USA. The impact of the regression method, data source, timing of RS and field-based biomass reference data acquisition, and the number of samples on the prediction results are investigated. R2 values for end-of-season biomass ranged from 0.64 to 0.89 for different experiments when features from all the data sources were included. Geometry-based features derived from the LiDAR point cloud to characterize plant structure and chemistry-based features extracted from hyperspectral data provided the most accurate predictions. Evaluation of the impact of the time of data acquisition during the growing season on the prediction results indicated that although the most accurate and reliable predictions of final biomass were achieved using remotely sensed data from mid-season to end-of-season, predictions in mid-season provided adequate results to differentiate between promising varieties for selection. The analysis of variance (ANOVA) of the accuracies of the predictive models showed that both the data source and regression method are important factors for a reliable prediction; however, the data source was more important with 69% significance, versus 28% significance for the regression method.
Collapse
|
168
|
Ghensi P, Manghi P, Zolfo M, Armanini F, Pasolli E, Bolzan M, Bertelle A, Dell'Acqua F, Dellasega E, Waldner R, Tessarolo F, Tomasi C, Segata N. Strong oral plaque microbiome signatures for dental implant diseases identified by strain-resolution metagenomics. NPJ Biofilms Microbiomes 2020; 6:47. [PMID: 33127901 PMCID: PMC7603341 DOI: 10.1038/s41522-020-00155-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 10/02/2020] [Indexed: 12/11/2022] Open
Abstract
Dental implants are installed in an increasing number of patients. Mucositis and peri-implantitis are common microbial-biofilm-associated diseases affecting the tissues that surround the dental implant and are a major medical and socioeconomic burden. By metagenomic sequencing of the plaque microbiome in different peri-implant health and disease conditions (113 samples from 72 individuals), we found microbial signatures for peri-implantitis and mucositis and defined the peri-implantitis-related complex (PiRC) composed by the 7 most discriminative bacteria. The peri-implantitis microbiome is site specific as contralateral healthy sites resembled more the microbiome of healthy implants, while mucositis was specifically enriched for Fusobacterium nucleatum acting as a keystone colonizer. Microbiome-based machine learning showed high diagnostic and prognostic power for peri-implant diseases and strain-level profiling identified a previously uncharacterized subspecies of F. nucleatum to be particularly associated with disease. Altogether, we associated the plaque microbiome with peri-implant diseases and identified microbial signatures of disease severity.
Collapse
Affiliation(s)
- Paolo Ghensi
- Department CIBIO, University of Trento, Trento, Italy
| | - Paolo Manghi
- Department CIBIO, University of Trento, Trento, Italy
| | - Moreno Zolfo
- Department CIBIO, University of Trento, Trento, Italy
| | | | | | - Mattia Bolzan
- Department CIBIO, University of Trento, Trento, Italy.,PreBiomics S.r.l., Trento, Italy
| | | | | | | | | | - Francesco Tessarolo
- Department of Industrial Engineering, University of Trento, Trento, Italy.,Healthcare Research and Innovation Program (IRCS-FBK-PAT), Bruno Kessler Foundation, Trento, Italy
| | - Cristiano Tomasi
- Department of Periodontology, Institute of Odontology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
169
|
High-Resolution Soybean Yield Mapping Across the US Midwest Using Subfield Harvester Data. REMOTE SENSING 2020. [DOI: 10.3390/rs12213471] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cloud computing and freely available, high-resolution satellite data have enabled recent progress in crop yield mapping at fine scales. However, extensive validation data at a matching resolution remain uncommon or infeasible due to data availability. This has limited the ability to evaluate different yield estimation models and improve understanding of key features useful for yield estimation in both data-rich and data-poor contexts. Here, we assess machine learning models’ capacity for soybean yield prediction using a unique ground-truth dataset of high-resolution (5 m) yield maps generated from combine harvester yield monitor data for over a million field-year observations across the Midwestern United States from 2008 to 2018. First, we compare random forest (RF) implementations, testing a range of feature engineering approaches using Sentinel-2 and Landsat spectral data for 20- and 30-m scale yield prediction. We find that Sentinel-2-based models can explain up to 45% of out-of-sample yield variability from 2017 to 2018 (r2 = 0.45), while Landsat models explain up to 43% across the longer 2008–2018 period. Using discrete Fourier transforms, or harmonic regressions, to capture soybean phenology improved the Landsat-based model considerably. Second, we compare RF models trained using this ground-truth data to models trained on available county-level statistics. We find that county-level models rely more heavily on just a few predictors, namely August weather covariates (vapor pressure deficit, rainfall, temperature) and July and August near-infrared observations. As a result, county-scale models perform relatively poorly on field-scale validation (r2 = 0.32), especially for high-yielding fields, but perform similarly to field-scale models when evaluated at the county scale (r2 = 0.82). Finally, we test whether our findings on variable importance can inform a simple, generalizable framework for regions or time periods beyond ground data availability. To do so, we test improvements to a Scalable Crop Yield Mapper (SCYM) approach that uses crop simulations to train statistical models for yield estimation. Based on findings from our RF models, we employ harmonic regressions to estimate peak vegetation index (VI) and a VI observation 30 days later, with August rainfall as the sole weather covariate in our new SCYM model. Modifications improved SCYM’s explained variance (r2 = 0.27 at the 30 m scale) and provide a new, parsimonious model.
Collapse
|
170
|
Góralska M, Bińkowski J, Lenarczyk N, Bienias A, Grądzielewska A, Czyczyło-Mysza I, Kapłoniak K, Stojałowski S, Myśków B. How Machine Learning Methods Helped Find Putative Rye Wax Genes Among GBS Data. Int J Mol Sci 2020; 21:E7501. [PMID: 33053706 PMCID: PMC7593958 DOI: 10.3390/ijms21207501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 09/23/2020] [Accepted: 10/07/2020] [Indexed: 11/17/2022] Open
Abstract
The standard approach to genetic mapping was supplemented by machine learning (ML) to establish the location of the rye gene associated with epicuticular wax formation (glaucous phenotype). Over 180 plants of the biparental F2 population were genotyped with the DArTseq (sequencing-based diversity array technology). A maximum likelihood (MLH) algorithm (JoinMap 5.0) and three ML algorithms: logistic regression (LR), random forest and extreme gradient boosted trees (XGBoost), were used to select markers closely linked to the gene encoding wax layer. The allele conditioning the nonglaucous appearance of plants, derived from the cultivar Karlikovaja Zelenostebelnaja, was mapped at the chromosome 2R, which is the first report on this localization. The DNA sequence of DArT-Silico 3585843, closely linked to wax segregation detected by using ML methods, was indicated as one of the candidates controlling the studied trait. The putative gene encodes the ABCG11 transporter.
Collapse
Affiliation(s)
- Magdalena Góralska
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Jan Bińkowski
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Natalia Lenarczyk
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Anna Bienias
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Agnieszka Grądzielewska
- Institute of Plant Genetics, Breeding and Biotechnology, University of Life Sciences in Lublin, ul. Akademicka, 20–950 Lublin, Poland;
| | - Ilona Czyczyło-Mysza
- Polish Academy of Sciences, The Franciszek Górski Institute of Plant Physiology, Niezapominajek 21, 30–239 Kraków, Poland; (I.C.-M.); (K.K.)
| | - Kamila Kapłoniak
- Polish Academy of Sciences, The Franciszek Górski Institute of Plant Physiology, Niezapominajek 21, 30–239 Kraków, Poland; (I.C.-M.); (K.K.)
| | - Stefan Stojałowski
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| | - Beata Myśków
- Department of Plant Genetics, Breeding and Biotechnology, West-Pomeranian University of Technology, Szczecin, ul. Słowackiego 17, 71–434 Szczecin, Poland; (M.G.); (J.B.); (N.L.); (A.B.); (S.S.)
| |
Collapse
|
171
|
Schwab P, DuMont Schütte A, Dietz B, Bauer S. Clinical Predictive Models for COVID-19: Systematic Study. J Med Internet Res 2020; 22:e21439. [PMID: 32976111 PMCID: PMC7541040 DOI: 10.2196/21439] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/30/2020] [Accepted: 09/14/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND COVID-19 is a rapidly emerging respiratory disease caused by SARS-CoV-2. Due to the rapid human-to-human transmission of SARS-CoV-2, many health care systems are at risk of exceeding their health care capacities, in particular in terms of SARS-CoV-2 tests, hospital and intensive care unit (ICU) beds, and mechanical ventilators. Predictive algorithms could potentially ease the strain on health care systems by identifying those who are most likely to receive a positive SARS-CoV-2 test, be hospitalized, or admitted to the ICU. OBJECTIVE The aim of this study is to develop, study, and evaluate clinical predictive models that estimate, using machine learning and based on routinely collected clinical data, which patients are likely to receive a positive SARS-CoV-2 test or require hospitalization or intensive care. METHODS Using a systematic approach to model development and optimization, we trained and compared various types of machine learning models, including logistic regression, neural networks, support vector machines, random forests, and gradient boosting. To evaluate the developed models, we performed a retrospective evaluation on demographic, clinical, and blood analysis data from a cohort of 5644 patients. In addition, we determined which clinical features were predictive to what degree for each of the aforementioned clinical tasks using causal explanations. RESULTS Our experimental results indicate that our predictive models identified patients that test positive for SARS-CoV-2 a priori at a sensitivity of 75% (95% CI 67%-81%) and a specificity of 49% (95% CI 46%-51%), patients who are SARS-CoV-2 positive that require hospitalization with 0.92 area under the receiver operator characteristic curve (AUC; 95% CI 0.81-0.98), and patients who are SARS-CoV-2 positive that require critical care with 0.98 AUC (95% CI 0.95-1.00). CONCLUSIONS Our results indicate that predictive models trained on routinely collected clinical data could be used to predict clinical pathways for COVID-19 and, therefore, help inform care and prioritize resources.
Collapse
Affiliation(s)
| | | | - Benedikt Dietz
- Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Stefan Bauer
- Max Planck Institute for Intelligent Systems, Tübingen, Germany
| |
Collapse
|
172
|
Assessing the Fractional Abundance of Highly Mixed Salt-Marsh Vegetation Using Random Forest Soft Classification. REMOTE SENSING 2020. [DOI: 10.3390/rs12193224] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Coastal salt marshes are valuable and critical components of tidal landscapes, currently threatened by increasing rates of sea level rise, wave-induced lateral erosion, decreasing sediment supply, and human pressure. Halophytic vegetation plays an important role in salt-marsh erosional and depositional patterns and marsh survival. Mapping salt-marsh halophytic vegetation species and their fractional abundance within plant associations can provide important information on marsh vulnerability and coastal management. Remote sensing has often provided valuable methods for salt-marsh vegetation mapping; however, it has seldom been used to assess the fractional abundance of halophytes. In this study, we developed and tested a novel approach to estimate fractional abundance of halophytic species and bare soil that is based on Random Forest (RF) soft classification. This approach can fully use the information contained in the frequency of decision tree “votes” to estimate fractional abundance of each species. Such a method was applied to WorldView-2 (WV-2) data acquired for the Venice lagoon (Italy), where marshes are characterized by a high diversity of vegetation species. The proposed method was successfully tested against field observations derived from ancillary field surveys. Our results show that the new approach allows one to obtain high accuracy (6.7% < root-mean-square error (RMSE) < 18.7% and 0.65 < R2 < 0.96) in estimating the sub-pixel fractional abundance of marsh-vegetation species. Comparing results obtained with the new RF soft-classification approach with those obtained using the traditional RF regression method for fractional abundance estimation, we find a superior performance of the novel RF soft-classification approach with respect to the existing RF regression methods. The distribution of the dominant species obtained from the RF soft classification was compared to the one obtained from an RF hard classification, showing that numerous mixed areas are wrongly labeled as populated by specific species by the hard classifier. As for the effectiveness of using WV-2 for salt-marsh vegetation mapping, feature importance analyses suggest that Yellow (584–632 nm), NIR 1 (near-infrared 1, 765–901 nm) and NIR 2 (near-infrared 2, 856–1043 nm) bands are critical in RF soft classification. Our results bear important consequences for mapping and monitoring vegetation-species fractional abundance within plant associations and their dynamics, which are key aspects in biogeomorphic analyses of salt-marsh landscapes.
Collapse
|
173
|
QRS Differentiation to Improve ECG Biometrics under Different Physical Scenarios Using Multilayer Perceptron. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10196896] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Currently, machine learning techniques are successfully applied in biometrics and Electrocardiogram (ECG) biometrics specifically. However, not many works deal with different physiological states in the user, which can provide significant heart rate variations, being these a key matter when working with ECG biometrics. Techniques in machine learning simplify the feature extraction process, where sometimes it can be reduced to a fixed segmentation. The applied database includes visits taken in two different days and three different conditions (sitting down, standing up after exercise), which is not common in current public databases. These characteristics allow studying differences among users under different scenarios, which may affect the pattern in the acquired data. Multilayer Perceptron (MLP) is used as a classifier to form a baseline, as it has a simple structure that has provided good results in the state-of-the-art. This work studies its behavior in ECG verification by using QRS complexes, finding its best hyperparameter configuration through tuning. The final performance is calculated considering different visits for enrolling and verification. Differentiation in the QRS complexes is also tested, as it is already required for detection, proving that applying a simple first differentiation gives a good result in comparison to state-of-the-art similar works. Moreover, it also improves the computational cost by avoiding complex transformations and using only one type of signal. When applying different numbers of complexes, the best results are obtained when 100 and 187 complexes in enrolment, obtaining Equal Error Rates (EER) that range between 2.79–4.95% and 2.69–4.71%, respectively.
Collapse
|
174
|
Mixture Optimization of Recycled Aggregate Concrete Using Hybrid Machine Learning Model. MATERIALS 2020; 13:ma13194331. [PMID: 33003383 PMCID: PMC7579239 DOI: 10.3390/ma13194331] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 09/18/2020] [Accepted: 09/24/2020] [Indexed: 11/17/2022]
Abstract
Recycled aggregate concrete (RAC) contributes to mitigating the depletion of natural aggregates, alleviating the carbon footprint of concrete construction, and averting the landfilling of colossal amounts of construction and demolition waste. However, complexities in the mixture optimization of RAC due to the variability of recycled aggregates and lack of accuracy in estimating its compressive strength require novel and sophisticated techniques. This paper aims at developing state-of-the-art machine learning models to predict the RAC compressive strength and optimize its mixture design. Results show that the developed models including Gaussian processes, deep learning, and gradient boosting regression achieved robust predictive performance, with the gradient boosting regression trees yielding highest prediction accuracy. Furthermore, a particle swarm optimization coupled with gradient boosting regression trees model was developed to optimize the mixture design of RAC for various compressive strength classes. The hybrid model achieved cost-saving RAC mixture designs with lower environmental footprint for different target compressive strength classes. The model could be further harvested to achieve sustainable concrete with optimal recycled aggregate content, least cost, and least environmental footprint.
Collapse
|
175
|
Hyperspectral and Thermal Sensing of Stomatal Conductance, Transpiration, and Photosynthesis for Soybean and Maize under Drought. REMOTE SENSING 2020. [DOI: 10.3390/rs12193182] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
During water stress, crops undertake adjustments in functional, structural, and biochemical traits. Hyperspectral data and machine learning techniques (PLS-R) can be used to assess water stress responses in plant physiology. In this study, we investigated the potential of hyperspectral optical (VNIR) measurements supplemented with thermal remote sensing and canopy height (hc) to detect changes in leaf physiology of soybean (C3) and maize (C4) plants under three levels of soil moisture in controlled environmental conditions. We measured canopy evapotranspiration (ET), leaf transpiration (Tr), leaf stomatal conductance (gs), leaf photosynthesis (A), leaf chlorophyll content and morphological properties (hc and LAI), as well as vegetation cover reflectance and radiometric temperature (TL,Rad). Our results showed that water stress caused significant ET decreases in both crops. This reduction was linked to tighter stomatal control for soybean plants, whereas LAI changes were the primary control on maize ET. Spectral vegetation indices (VIs) and TL,Rad were able to track these different responses to drought, but only after controlling for confounding changes in phenology. PLS-R modeling of gs, Tr, and A using hyperspectral data was more accurate when pooling data from both crops together rather than individually. Nonetheless, separated PLS-R crop models are useful to identify the most relevant variables in each crop such as TL,Rad for soybean and hc for maize under our experimental conditions. Interestingly, the most important spectral bands sensitive to drought, derived from PLS-R analysis, were not exactly centered at the same wavelengths of the studied VIs sensitive to drought, highlighting the benefit of having contiguous narrow spectral bands to predict leaf physiology and suggesting different wavelength combinations based on crop type. Our results are only a first but a promising step towards larger scale remote sensing applications (e.g., airborne and satellite). PLS-R estimates of leaf physiology could help to parameterize canopy level GPP or ET models and to identify different photosynthetic paths or the degree of stomatal closure in response to drought.
Collapse
|
176
|
CBRL and CBRC: Novel Algorithms for Improving Missing Value Imputation Accuracy Based on Bayesian Ridge Regression. Symmetry (Basel) 2020. [DOI: 10.3390/sym12101594] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In most scientific studies such as data analysis, the existence of missing data is a critical problem, and selecting the appropriate approach to deal with missing data is a challenge. In this paper, the authors perform a fair comparative study of some practical imputation methods used for handling missing values against two proposed imputation algorithms. The proposed algorithms depend on the Bayesian Ridge technique under two different feature selection conditions. The proposed algorithms differ from the existing approaches in that they cumulate the imputed features; those imputed features will be incorporated within the Bayesian Ridge equation for predicting the missing values in the next incomplete selected feature. The authors applied the proposed algorithms on eight datasets with different amount of missing values created from different missingness mechanisms. The performance was measured in terms of imputation time, root-mean-square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE). The results showed that the performance varies depending on missing values percentage, size of the dataset, and the missingness mechanism. In addition, the performance of the proposed methods is slightly better.
Collapse
|
177
|
Roney CH, Beach ML, Mehta AM, Sim I, Corrado C, Bendikas R, Solis-Lemus JA, Razeghi O, Whitaker J, O’Neill L, Plank G, Vigmond E, Williams SE, O’Neill MD, Niederer SA. In silico Comparison of Left Atrial Ablation Techniques That Target the Anatomical, Structural, and Electrical Substrates of Atrial Fibrillation. Front Physiol 2020; 11:1145. [PMID: 33041850 PMCID: PMC7526475 DOI: 10.3389/fphys.2020.572874] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 08/18/2020] [Indexed: 12/17/2022] Open
Abstract
Catheter ablation therapy for persistent atrial fibrillation (AF) typically includes pulmonary vein isolation (PVI) and may include additional ablation lesions that target patient-specific anatomical, electrical, or structural features. Clinical centers employ different ablation strategies, which use imaging data together with electroanatomic mapping data, depending on data availability. The aim of this study was to compare ablation techniques across a virtual cohort of AF patients. We constructed 20 paroxysmal and 30 persistent AF patient-specific left atrial (LA) bilayer models incorporating fibrotic remodeling from late-gadolinium enhancement (LGE) MRI scans. AF was simulated and post-processed using phase mapping to determine electrical driver locations over 15 s. Six different ablation approaches were tested: (i) PVI alone, modeled as wide-area encirclement of the pulmonary veins; PVI together with: (ii) roof and inferior lines to model posterior wall box isolation; (iii) isolating the largest fibrotic area (identified by LGE-MRI); (iv) isolating all fibrotic areas; (v) isolating the largest driver hotspot region [identified as high simulated phase singularity (PS) density]; and (vi) isolating all driver hotspot regions. Ablation efficacy was assessed to predict optimal ablation therapies for individual patients. We subsequently trained a random forest classifier to predict ablation response using (a) imaging metrics alone, (b) imaging and electrical metrics, or (c) imaging, electrical, and ablation lesion metrics. The optimal ablation approach resulting in termination, or if not possible atrial tachycardia (AT), varied among the virtual patient cohort: (i) 20% PVI alone, (ii) 6% box ablation, (iii) 2% largest fibrosis area, (iv) 4% all fibrosis areas, (v) 2% largest driver hotspot, and (vi) 46% all driver hotspots. Around 20% of cases remained in AF for all ablation strategies. The addition of patient-specific and ablation pattern specific lesion metrics to the trained random forest classifier improved predictive capability from an accuracy of 0.73 to 0.83. The trained classifier results demonstrate that the surface areas of pre-ablation driver regions and of fibrotic tissue not isolated by the proposed ablation strategy are both important for predicting ablation outcome. Overall, our study demonstrates the need to select the optimal ablation strategy for each patient. It suggests that both patient-specific fibrosis properties and driver locations are important for planning ablation approaches, and the distribution of lesions is important for predicting an acute response.
Collapse
Affiliation(s)
- Caroline H. Roney
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Marianne L. Beach
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Arihant M. Mehta
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Iain Sim
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Cesare Corrado
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Rokas Bendikas
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Jose A. Solis-Lemus
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Orod Razeghi
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - John Whitaker
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Louisa O’Neill
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Gernot Plank
- Department of Biophysics, Medical University of Graz, Graz, Austria
| | - Edward Vigmond
- IHU Liryc, Electrophysiology and Heart Modeling Institute, Fondation Bordeaux Université, Bordeaux, France
| | - Steven E. Williams
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Mark D. O’Neill
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| | - Steven A. Niederer
- School of Biomedical Engineering and Imaging Sciences, King’s College London, London, United Kingdom
| |
Collapse
|
178
|
Plant Counting of Cotton from UAS Imagery Using Deep Learning-Based Object Detection Framework. REMOTE SENSING 2020. [DOI: 10.3390/rs12182981] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Assessing plant population of cotton is important to make replanting decisions in low plant density areas, prone to yielding penalties. Since the measurement of plant population in the field is labor intensive and subject to error, in this study, a new approach of image-based plant counting is proposed, using unmanned aircraft systems (UAS; DJI Mavic 2 Pro, Shenzhen, China) data. The previously developed image-based techniques required a priori information of geometry or statistical characteristics of plant canopy features, while also limiting the versatility of the methods in variable field conditions. In this regard, a deep learning-based plant counting algorithm was proposed to reduce the number of input variables, and to remove requirements for acquiring geometric or statistical information. The object detection model named You Only Look Once version 3 (YOLOv3) and photogrammetry were utilized to separate, locate, and count cotton plants in the seedling stage. The proposed algorithm was tested with four different UAS datasets, containing variability in plant size, overall illumination, and background brightness. Root mean square error (RMSE) and R2 values of the optimal plant count results ranged from 0.50 to 0.60 plants per linear meter of row (number of plants within 1 m distance along the planting row direction) and 0.96 to 0.97, respectively. The object detection algorithm, trained with variable plant size, ground wetness, and lighting conditions generally resulted in a lower detection error, unless an observable difference of developmental stages of cotton existed. The proposed plant counting algorithm performed well with 0–14 plants per linear meter of row, when cotton plants are generally separable in the seedling stage. This study is expected to provide an automated methodology for in situ evaluation of plant emergence using UAS data.
Collapse
|
179
|
Fan R, Zhang N, Yang L, Ke J, Zhao D, Cui Q. AI-based prediction for the risk of coronary heart disease among patients with type 2 diabetes mellitus. Sci Rep 2020; 10:14457. [PMID: 32879331 PMCID: PMC7467935 DOI: 10.1038/s41598-020-71321-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 07/27/2020] [Indexed: 11/09/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) is one common chronic disease caused by insulin secretion disorder that often leads to severe outcomes and even death due to complications, among which coronary heart disease (CHD) represents the most common and severe one. Given a huge number of T2DM patients, it is thus increasingly important to identify the ones with high risks of CHD complication but the quantitative method is still not available. Here, we first curated a dataset of 1,273 T2DM patients including 304 and 969 ones with or without CHD, respectively. We then trained an artificial intelligence (AI) model using randomly selected 4/5 of the dataset and use the rest data to validate the performance of the model. The result showed that the model achieved an AUC of 0.77 (fivefold cross-validation) on the training dataset and 0.80 on the testing dataset. To further confirm the performance of the presented model, we recruited 1,253 new T2DM patients as totally independent testing dataset including 200 and 1,053 ones with or without CHD. And the model achieved an AUC of 0.71. In addition, we implemented a model to quantitatively evaluate the risk contribution of each feature, which is thus able to present personalized guidance for specific individuals. Finally, an online web server for the model was built. This study presented an AI model to determine the risk of T2DM patients to develop to CHD, which has potential value in providing early warning personalized guidance of CHD risk for both T2DM patients and clinicians.
Collapse
Affiliation(s)
- Rui Fan
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Ning Zhang
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Lu He Hospital Capital Medical University, Beijing, 101149, China
| | - Longyan Yang
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Lu He Hospital Capital Medical University, Beijing, 101149, China
| | - Jing Ke
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Lu He Hospital Capital Medical University, Beijing, 101149, China
| | - Dong Zhao
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Lu He Hospital Capital Medical University, Beijing, 101149, China.
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
| |
Collapse
|
180
|
Birnbaum ML, Kulkarni PP, Van Meter A, Chen V, Rizvi AF, Arenare E, De Choudhury M, Kane JM. Utilizing Machine Learning on Internet Search Activity to Support the Diagnostic Process and Relapse Detection in Young Individuals With Early Psychosis: Feasibility Study. JMIR Ment Health 2020; 7:e19348. [PMID: 32870161 PMCID: PMC7492982 DOI: 10.2196/19348] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 07/20/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions. OBJECTIVE We aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders. METHODS We extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0% male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity. RESULTS Classifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health. CONCLUSIONS Online search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring.
Collapse
Affiliation(s)
- Michael Leo Birnbaum
- The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States
- Hofstra Northwell School of Medicine, Hempstead, NY, United States
| | | | - Anna Van Meter
- The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States
- Hofstra Northwell School of Medicine, Hempstead, NY, United States
| | - Victor Chen
- Georgia Institute of Technology, Atlanta, GA, United States
| | - Asra F Rizvi
- The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States
| | - Elizabeth Arenare
- The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States
| | | | - John M Kane
- The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States
- Hofstra Northwell School of Medicine, Hempstead, NY, United States
| |
Collapse
|
181
|
Adler DA, Ben-Zeev D, Tseng VWS, Kane JM, Brian R, Campbell AT, Hauser M, Scherer EA, Choudhury T. Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks. JMIR Mhealth Uhealth 2020; 8:e19962. [PMID: 32865506 PMCID: PMC7490673 DOI: 10.2196/19962] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 07/01/2020] [Accepted: 07/24/2020] [Indexed: 01/16/2023] Open
Abstract
Background Schizophrenia spectrum disorders (SSDs) are chronic conditions, but the severity of symptomatic experiences and functional impairments vacillate over the course of illness. Developing unobtrusive remote monitoring systems to detect early warning signs of impending symptomatic relapses would allow clinicians to intervene before the patient’s condition worsens. Objective In this study, we aim to create the first models, exclusively using passive sensing data from a smartphone, to predict behavioral anomalies that could indicate early warning signs of a psychotic relapse. Methods Data used to train and test the models were collected during the CrossCheck study. Hourly features derived from smartphone passive sensing data were extracted from 60 patients with SSDs (42 nonrelapse and 18 relapse >1 time throughout the study) and used to train models and test performance. We trained 2 types of encoder-decoder neural network models and a clustering-based local outlier factor model to predict behavioral anomalies that occurred within the 30-day period before a participant's date of relapse (the near relapse period). Models were trained to recreate participant behavior on days of relative health (DRH, outside of the near relapse period), following which a threshold to the recreation error was applied to predict anomalies. The neural network model architecture and the percentage of relapse participant data used to train all models were varied. Results A total of 20,137 days of collected data were analyzed, with 726 days of data (0.037%) within any 30-day near relapse period. The best performing model used a fully connected neural network autoencoder architecture and achieved a median sensitivity of 0.25 (IQR 0.15-1.00) and specificity of 0.88 (IQR 0.14-0.96; a median 108% increase in behavioral anomalies near relapse). We conducted a post hoc analysis using the best performing model to identify behavioral features that had a medium-to-large effect (Cohen d>0.5) in distinguishing anomalies near relapse from DRH among 4 participants who relapsed multiple times throughout the study. Qualitative validation using clinical notes collected during the original CrossCheck study showed that the identified features from our analysis were presented to clinicians during relapse events. Conclusions Our proposed method predicted a higher rate of anomalies in patients with SSDs within the 30-day near relapse period and can be used to uncover individual-level behaviors that change before relapse. This approach will enable technologists and clinicians to build unobtrusive digital mental health tools that can predict incipient relapse in SSDs.
Collapse
Affiliation(s)
| | - Dror Ben-Zeev
- BRiTE Center, Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States
| | | | - John M Kane
- Department of Psychiatry, The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States
| | - Rachel Brian
- BRiTE Center, Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, United States
| | | | - Marta Hauser
- Vanguard Research Group, Glen Oaks, NY, United States
| | - Emily A Scherer
- Biomedical Data Science Department, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | | |
Collapse
|
182
|
Visweswaran S, Colditz JB, O'Halloran P, Han NR, Taneja SB, Welling J, Chu KH, Sidani JE, Primack BA. Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study. J Med Internet Res 2020; 22:e17478. [PMID: 32784184 PMCID: PMC7450367 DOI: 10.2196/17478] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 06/05/2020] [Accepted: 06/11/2020] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. OBJECTIVE This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. METHODS We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. RESULTS LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. CONCLUSIONS We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.
Collapse
Affiliation(s)
- Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jason B Colditz
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Patrick O'Halloran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Na-Rae Han
- Department of Linguistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States
| | - Joel Welling
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kar-Hai Chu
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jaime E Sidani
- School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Brian A Primack
- College of Education and Health Professions, University of Arkansas, Fayetteville, AR, United States
| |
Collapse
|
183
|
Extended Isolation Forests for Fault Detection in Small Hydroelectric Plants. SUSTAINABILITY 2020. [DOI: 10.3390/su12166421] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Maintenance in small hydroelectric plants is fundamental for guaranteeing the expansion of clean energy sources and supplying the energy estimated to be necessary for the coming years. Most fault diagnosis models for hydroelectric generating units, proposed so far, are based on the distance between the normal operating profile and newly observed values. The extended isolation forest model is a model, based on binary trees, that has been gaining prominence in anomaly detection applications. However, no study so far has reported the application of the algorithm in the context of hydroelectric power generation. We compared this model with the PCA and KICA-PCA models, using one-year operating data in a small hydroelectric plant with time-series anomaly detection metrics. The algorithm showed satisfactory results with less variance than the others; therefore, it is a suitable candidate for online fault detection applications in the sector.
Collapse
|
184
|
Tanemura KA, Pei J, Merz KM. Refinement of pairwise potentials via logistic regression to score protein-protein interactions. Proteins 2020; 88:1559-1568. [PMID: 32729132 DOI: 10.1002/prot.25973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 05/17/2020] [Accepted: 06/14/2020] [Indexed: 12/20/2022]
Abstract
Protein-protein interactions (PPIs) are ubiquitous and functionally of great importance in biological systems. Hence, the accurate prediction of PPIs by protein-protein docking and scoring tools is highly desirable in order to characterize their structure and biological function. Ab initio docking protocols are divided into the sampling of docking poses to produce at least one near-native structure, and then to evaluate the vast candidate structures by scoring. Concurrent development in both sampling and scoring is crucial for the deployment of protein-protein docking software. In the present work, we apply a machine learning model on pairwise potentials to refine the task of protein quaternary structure native structure detection among decoys. A decoy set was featurized using the Knowledge and Empirical Combined Scoring Algorithm 2 (KECSA2) pairwise potential. The highly unbalanced decoy set was then balanced using a comparison concept between native and decoy structures. The resultant comparison descriptors were used to train a logistic regression (LR) classifier. The LR model yielded the optimal performance for native detection among decoys compared with conventional scoring functions, while exhibiting lesser performance for the detection of low root mean square deviation decoy structures. Its deployment on an independent benchmark set confirms that the scoring function performs competitively relative to other scoring functions. The scripts used are available at https://github.com/TanemuraKiyoto/PPI-native-detection-via-LR.
Collapse
Affiliation(s)
- Kiyoto A Tanemura
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| | - Jun Pei
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| | - Kenneth M Merz
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
185
|
Hswen Y, Zhang A, Sewalk KC, Tuli G, Brownstein JS, Hawkins JB. Investigation of Geographic and Macrolevel Variations in LGBTQ Patient Experiences: Longitudinal Social Media Analysis. J Med Internet Res 2020; 22:e17087. [PMID: 33137713 PMCID: PMC7428906 DOI: 10.2196/17087] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 04/25/2020] [Accepted: 04/26/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Discrimination in the health care system contributes to worse health outcomes among lesbian, gay, bisexual, transgender, and queer (LGBTQ) patients. OBJECTIVE The aim of this study is to examine disparities in patient experience among LGBTQ persons using social media data. METHODS We collected patient experience data from Twitter from February 2013 to February 2017 in the United States. We compared the sentiment of patient experience tweets between Twitter users who self-identified as LGBTQ and non-LGBTQ. The effect of state-level partisan identity on patient experience sentiment and differences between LGBTQ users and non-LGBTQ users were analyzed. RESULTS We observed lower (more negative) patient experience sentiment among 13,689 LGBTQ users compared to 1,362,395 non-LGBTQ users. Increasing state-level liberal political identification was associated with higher patient experience sentiment among all users but had stronger effects for LGBTQ users. CONCLUSIONS Our findings highlight that social media data can yield insights about patient experience for LGBTQ persons and suggest that a state-level sociopolitical environment influences patient experience for this group. Efforts are needed to reduce disparities in patient care for LGBTQ persons while taking into context the effect of the political climate on these inequities.
Collapse
Affiliation(s)
- Yulin Hswen
- Bakar Computational Health Sciences Institute, Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, United States
- Computational Epidemiology Lab, Harvard Medical School, Boston, MA, United States
| | - Amanda Zhang
- Innovation Program, Boston Children's Hospital, Boston, MA, United States
- Pritzker School of Medicine, The University of Chicago, Chicago, IL, United States
| | - Kara C Sewalk
- Innovation Program, Boston Children's Hospital, Boston, MA, United States
| | - Gaurav Tuli
- Innovation Program, Boston Children's Hospital, Boston, MA, United States
| | - John S Brownstein
- Computational Epidemiology Lab, Harvard Medical School, Boston, MA, United States
- Innovation Program, Boston Children's Hospital, Boston, MA, United States
| | - Jared B Hawkins
- Computational Epidemiology Lab, Harvard Medical School, Boston, MA, United States
- Innovation Program, Boston Children's Hospital, Boston, MA, United States
| |
Collapse
|
186
|
Prediction of Function in ABCA4-Related Retinopathy Using Ensemble Machine Learning. J Clin Med 2020; 9:jcm9082428. [PMID: 32751377 PMCID: PMC7463567 DOI: 10.3390/jcm9082428] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 07/19/2020] [Accepted: 07/28/2020] [Indexed: 12/14/2022] Open
Abstract
Full-field electroretinogram (ERG) and best corrected visual acuity (BCVA) measures have been shown to have prognostic value for recessive Stargardt disease (also called “ABCA4-related retinopathy”). These functional tests may serve as a performance-outcome-measure (PerfO) in emerging interventional clinical trials, but utility is limited by variability and patient burden. To address these limitations, an ensemble machine-learning-based approach was evaluated to differentiate patients from controls, and predict disease categories depending on ERG (‘inferred ERG’) and visual impairment (‘inferred visual impairment’) as well as BCVA values (‘inferred BCVA’) based on microstructural imaging (utilizing spectral-domain optical coherence tomography) and patient data. The accuracy for ‘inferred ERG’ and ‘inferred visual impairment’ was up to 99.53 ± 1.02%. Prediction of BCVA values (‘inferred BCVA’) achieved a precision of ±0.3LogMAR in up to 85.31% of eyes. Analysis of the permutation importance revealed that foveal status was the most important feature for BCVA prediction, while the thickness of outer nuclear layer and photoreceptor inner and outer segments as well as age of onset highly ranked for all predictions. ‘Inferred ERG’, ‘inferred visual impairment’, and ‘inferred BCVA’, herein, represent accurate estimates of differential functional effects of retinal microstructure, and offer quasi-functional parameters with the potential for a refined patient assessment, and investigation of potential future treatment effects or disease progression.
Collapse
|
187
|
IrrMapper: A Machine Learning Approach for High Resolution Mapping of Irrigated Agriculture Across the Western U.S. REMOTE SENSING 2020. [DOI: 10.3390/rs12142328] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
High frequency and spatially explicit irrigated land maps are important for understanding the patterns and impacts of consumptive water use by agriculture. We built annual, 30 m resolution irrigation maps using Google Earth Engine for the years 1986–2018 for 11 western states within the conterminous U.S. Our map classifies lands into four classes: irrigated agriculture, dryland agriculture, uncultivated land, and wetlands. We built an extensive geospatial database of land cover from each class, including over 50,000 human-verified irrigated fields, 38,000 dryland fields, and over 500,000 km 2 of uncultivated lands. We used 60,000 point samples from 28 years to extract Landsat satellite imagery, as well as climate, meteorology, and terrain data to train a Random Forest classifier. Using a spatially independent validation dataset of 40,000 points, we found our classifier has an overall binary classification (irrigated vs. unirrigated) accuracy of 97.8%, and a four-class overall accuracy of 90.8%. We compared our results to Census of Agriculture irrigation estimates over the seven years of available data and found good overall agreement between the 2832 county-level estimates (r 2 = 0.90), and high agreement when estimates are aggregated to the state level (r 2 = 0.94). We analyzed trends over the 33-year study period, finding an increase of 15% (15,000 km 2 ) in irrigated area in our study region. We found notable decreases in irrigated area in developing urban areas and in the southern Central Valley of California and increases in the plains of eastern Colorado, the Columbia River Basin, the Snake River Plain, and northern California.
Collapse
|
188
|
Using Sensor Data to Detect Lameness and Mastitis Treatment Events in Dairy Cows: A Comparison of Classification Models. SENSORS 2020; 20:s20143863. [PMID: 32664417 PMCID: PMC7411665 DOI: 10.3390/s20143863] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/01/2020] [Accepted: 07/09/2020] [Indexed: 11/17/2022]
Abstract
The aim of this study was to develop classification models for mastitis and lameness treatments in Holstein dairy cows as the target variables based on continuous data from herd management software with modern machine learning methods. Data was collected over a period of 40 months from a total of 167 different cows with daily individual sensor information containing milking parameters, pedometer activity, feed and water intake, and body weight (in the form of differently aggregated data) as well as the entered treatment data. To identify the most important predictors for mastitis and lameness treatments, respectively, Random Forest feature importance, Pearson’s correlation and sequential forward feature selection were applied. With the selected predictors, various machine learning models such as Logistic Regression (LR), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gaussian Naïve Bayes (GNB), Extra Trees Classifier (ET) and different ensemble methods such as Random Forest (RF) were trained. Their performance was compared using the receiver operator characteristic (ROC) area-under-curve (AUC), as well as sensitivity, block sensitivity and specificity. In addition, sampling methods were compared: Over- and undersampling as compensation for the expected unbalanced training data had a high impact on the ratio of sensitivity and specificity in the classification of the test data, but with regard to AUC, random oversampling and SMOTE (Synthetic Minority Over-sampling) even showed significantly lower values than with non-sampled data. The best model, ET, obtained a mean AUC of 0.79 for mastitis and 0.71 for lameness, respectively, based on testing data from practical conditions and is recommended by us for this type of data, but GNB, LR and RF were only marginally worse, and random oversampling and SMOTE even showed significantly lower values than without sampling. We recommend the use of these models as a benchmark for similar self-learning classification tasks. The classification models presented here retain their interpretability with the ability to present feature importances to the farmer in contrast to the “black box” models of Deep Learning methods.
Collapse
|
189
|
Wentzel A, Hanula P, van Dijk LV, Elgohari B, Mohamed ASR, Cardenas CE, Fuller CD, Vock DM, Canahuate G, Marai GE. Precision toxicity correlates of tumor spatial proximity to organs at risk in cancer patients receiving intensity-modulated radiotherapy. Radiother Oncol 2020; 148:245-251. [PMID: 32422303 PMCID: PMC7390671 DOI: 10.1016/j.radonc.2020.05.023] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 05/12/2020] [Accepted: 05/12/2020] [Indexed: 10/24/2022]
Abstract
PURPOSE Using a 200 Head and Neck cancer (HNC) patient cohort, we employ patient similarity based on tumor location, volume, and proximity to organs at risk to predict radiation-associated dysphagia (RAD) in a new patient receiving intensity modulated radiation therapy (IMRT). MATERIAL AND METHODS All patients were treated using curative-intent IMRT. Anatomical features were extracted from contrast-enhanced tomography scans acquired pre-treatment. Patient similarity was computed using a topological similarity measure, which allowed for the prediction of normal tissues' mean doses. We performed feature selection and clustering, and used the resulting groups of patients to forecast RAD. We used Logistic Regression (LG) cross-validation to assess the potential toxicity risk of these groupings. RESULTS Out of 200 patients, 34 patients were recorded as having RAD. Patient clusters were significantly correlated with RAD (p < .0001). The area under the receiver-operator curve (AUC) using pre-established, baseline features gave a predictive accuracy of 0.79, while the addition of our cluster labels improved accuracy to 0.84. CONCLUSION Our results show that spatial information available pre-treatment can be used to robustly identify groups of RAD high-risk patients. We identify feature sets that considerably improve toxicity risk prediction beyond what is possible using baseline features. Our results also suggest that similarity-based predicted mean doses to organs can be used as valid predictors of risk to organs.
Collapse
Affiliation(s)
- Andrew Wentzel
- Department of Computer Science, The University of Illinois at Chicago, Chicago, USA.
| | - Peter Hanula
- Department of Computer Science, The University of Illinois at Chicago, Chicago, USA
| | - Lisanne V van Dijk
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - Baher Elgohari
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, USA; Department of Clinical Oncology and Nuclear Medicine, Mansoura University, Egypt
| | - Abdallah S R Mohamed
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - Carlos E Cardenas
- Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - David M Vock
- Division of Biostatistics, University of Minnesota, Minneapolis, USA
| | - Guadalupe Canahuate
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, USA
| | - G E Marai
- Department of Computer Science, The University of Illinois at Chicago, Chicago, USA.
| |
Collapse
|
190
|
Arús-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminform 2020; 12:38. [PMID: 33431013 PMCID: PMC7260788 DOI: 10.1186/s13321-020-00441-8] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/16/2020] [Indexed: 12/21/2022] Open
Abstract
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Atanas Patronov
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong Laboratory, Guangzhou, China
| | - Ola Engkvist
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
191
|
Identifying Post-Fire Recovery Trajectories and Driving Factors Using Landsat Time Series in Fire-Prone Mediterranean Pine Forests. REMOTE SENSING 2020. [DOI: 10.3390/rs12091499] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Wildfires constitute the most important natural disturbance of Mediterranean forests, driving vegetation dynamics. Although Mediterranean species have developed ecological post-fire recovery strategies, the impacts of climate change and changes in fire regimes may endanger their resilience capacity. This study aims at assessing post-fire recovery dynamics at different stages in two large fires that occurred in Mediterranean pine forests (Spain) using temporal segmentation of the Landsat time series (1994–2018). Landsat-based detection of Trends in Disturbance and Recovery (LandTrendr) was used to derive trajectory metrics from Tasseled Cap Wetness (TCW), sensitive to canopy moisture and structure, and Tasseled Cap Angle (TCA), related to vegetation cover gradients. Different groups of post-fire trajectories were identified through K-means clustering of the Recovery Ratios (RR) from fitted trajectories: continuous recovery, continuous recovery with slope changes, continuous recovery stabilized and non-continuous recovery. The influence of pre-fire conditions, fire severity, topographic variables and post-fire climate on recovery rates for each recovery category at successional stages was analyzed through Geographically Weighted Regression (GWR). The modeling results indicated that pine forest recovery rates were highly sensitive to post-fire climate in the mid and long-term and to fire severity in the short-term, but less influenced by topographic conditions (adjusted R-squared ranged from 0.58 to 0.88 and from 0.54 to 0.93 for TCA and TCW, respectively). Recovery estimation was assessed through orthophotos, showing a high accuracy (Dice Coefficient ranged from 0.81 to 0.97 and from 0.74 to 0.96 for TCA and TCW, respectively). This study provides new insights into the post-fire recovery dynamics at successional stages and driving factors. The proposed method could be an approach to model the recovery for the Mediterranean areas and help managers in determining which areas may not be able to recover naturally.
Collapse
|
192
|
Jarchi D, Andreu-Perez J, Kiani M, Vysata O, Kuchynka J, Prochazka A, Sanei S. Recognition of Patient Groups with Sleep Related Disorders using Bio-signal Processing and Deep Learning. SENSORS 2020; 20:s20092594. [PMID: 32370185 PMCID: PMC7248846 DOI: 10.3390/s20092594] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 04/20/2020] [Accepted: 04/28/2020] [Indexed: 11/16/2022]
Abstract
Accurately diagnosing sleep disorders is essential for clinical assessments and treatments. Polysomnography (PSG) has long been used for detection of various sleep disorders. In this research, electrocardiography (ECG) and electromayography (EMG) have been used for recognition of breathing and movement-related sleep disorders. Bio-signal processing has been performed by extracting EMG features exploiting entropy and statistical moments, in addition to developing an iterative pulse peak detection algorithm using synchrosqueezed wavelet transform (SSWT) for reliable extraction of heart rate and breathing-related features from ECG. A deep learning framework has been designed to incorporate EMG and ECG features. The framework has been used to classify four groups: healthy subjects, patients with obstructive sleep apnea (OSA), patients with restless leg syndrome (RLS) and patients with both OSA and RLS. The proposed deep learning framework produced a mean accuracy of 72% and weighted F1 score of 0.57 across subjects for our formulated four-class problem.
Collapse
Affiliation(s)
- Delaram Jarchi
- Smart Health Technologies Group, School of Computer Science and Electronic Engineering; University of Essex, Colchester CO4 3SQ, UK; (J.A.-P.); (M.K.)
- Embedded and Intelligent Systems Laboratory, School of Computer Science and Electronics, University of Essex, Colchester CO4 3SQ, UK
- Correspondence:
| | - Javier Andreu-Perez
- Smart Health Technologies Group, School of Computer Science and Electronic Engineering; University of Essex, Colchester CO4 3SQ, UK; (J.A.-P.); (M.K.)
- Embedded and Intelligent Systems Laboratory, School of Computer Science and Electronics, University of Essex, Colchester CO4 3SQ, UK
| | - Mehrin Kiani
- Smart Health Technologies Group, School of Computer Science and Electronic Engineering; University of Essex, Colchester CO4 3SQ, UK; (J.A.-P.); (M.K.)
| | - Oldrich Vysata
- Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, 166 28 Prague 6, Czech Republic; (O.V.); (A.P.)
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 05 Hradec Králové, Czech Republic;
| | - Jiri Kuchynka
- Department of Neurology, Faculty of Medicine in Hradec Králové, Charles University, 500 05 Hradec Králové, Czech Republic;
| | - Ales Prochazka
- Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, 166 28 Prague 6, Czech Republic; (O.V.); (A.P.)
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, 160 00 Prague 6, Czech Republic
| | - Saeid Sanei
- School of Science and Technology, Nottingham Trent University, Nottingham NG11 8NS, UK;
| |
Collapse
|
193
|
Karydas C, Iatrou M, Kouretas D, Patouna A, Iatrou G, Lazos N, Gewehr S, Tseni X, Tekos F, Zartaloudis Z, Mainos E, Mourelatos S. Prediction of Antioxidant Activity of Cherry Fruits from UAS Multispectral Imagery Using Machine Learning. Antioxidants (Basel) 2020; 9:E156. [PMID: 32075036 PMCID: PMC7070805 DOI: 10.3390/antiox9020156] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 02/06/2020] [Accepted: 02/11/2020] [Indexed: 12/26/2022] Open
Abstract
In this research, a model for the estimation of antioxidant content in cherry fruits from multispectral imagery acquired from drones was developed, based on machine learning methods. For two consecutive cultivation years, the trees were sampled on different dates and then analysed for their fruits' radical scavenging activity (DPPH) and Folin-Ciocalteu (FCR) reducing capacity. Multispectral images from unmanned aerial vehicles were acquired on the same dates with fruit sampling. Soil samples were collected throughout the study fields at the end of the season. Topographic, hydrographic and weather data also were included in modelling. First-year data were used for model-fitting, whereas second-year data for testing. Spatial autocorrelation tests indicated unbiased sampling and, moreover, allowed restriction of modelling input parameters to a smaller group. The optimum model employs 24 input variables resulting in a 6.74 root mean square error. Provided that soil profiles and other ancillary data are known in advance of the cultivation season, capturing drone images in critical growth phases, together with contemporary weather data, can support site- and time-specific harvesting. It could also support site-specific treatments (precision farming) for improving fruit quality in the long-term, with analogous marketing perspectives.
Collapse
Affiliation(s)
- Christos Karydas
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| | - Miltiadis Iatrou
- Agroecosystem L.P., Research and Trade of Agricultural Products, 63200 Nea Moudania, Greece; (M.I.); (Z.Z.)
| | - Dimitrios Kouretas
- Laboratory of Animal Physiology, Dept. of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece; (D.K.); (A.P.); (F.T.)
| | - Anastasia Patouna
- Laboratory of Animal Physiology, Dept. of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece; (D.K.); (A.P.); (F.T.)
| | - George Iatrou
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| | - Nikolaos Lazos
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| | - Sandra Gewehr
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| | - Xanthi Tseni
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| | - Fotis Tekos
- Laboratory of Animal Physiology, Dept. of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece; (D.K.); (A.P.); (F.T.)
| | - Zois Zartaloudis
- Agroecosystem L.P., Research and Trade of Agricultural Products, 63200 Nea Moudania, Greece; (M.I.); (Z.Z.)
| | | | - Spiros Mourelatos
- Ecodevelopment S.A., Environmental Applications, 57010 Thessaloniki, Greece; (G.I.); (N.L.); (S.G.); (X.T.); (S.M.)
| |
Collapse
|
194
|
Doing More with Less: A Comparison of 16S Hypervariable Regions in Search of Defining the Shrimp Microbiota. Microorganisms 2020; 8:microorganisms8010134. [PMID: 31963525 PMCID: PMC7022540 DOI: 10.3390/microorganisms8010134] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 01/12/2020] [Accepted: 01/15/2020] [Indexed: 12/12/2022] Open
Abstract
The shrimp has become the most valuable traded marine product in the world, and its microbiota plays an essential role in its development and overall health status. Massive high-throughput sequencing techniques using several hypervariable regions of the 16S rRNA gene are broadly applied in shrimp microbiota studies. However, it is essential to consider that the use of different hypervariable regions can influence the obtained data and the interpretation of the results. The present study compares the shrimp microbiota structure and composition obtained by three types of amplicons: one spanning both the V3 and V4 hypervariable regions (V3V4), one for the V3 region only (V3), and one for the V4 region only (V4) using the same experimental and bioinformatics protocols. Twenty-four samples from hepatopancreas and intestine were sequenced and evaluated using the GreenGenes and silva reference databases for clustering and taxonomic classification. In general, the V3V4 regions resulted in higher richness and diversity, followed by V3 and V4. All three regions establish an apparent clustering effect that discriminates between the two analyzed organs and describe a higher richness for the intestine and a higher diversity for the hepatopancreas samples. Proteobacteria was the most abundant phyla overall, and Cyanobacteria was more common in the intestine, whereas Firmicutes and Actinobacteria were more prevalent in hepatopancreas samples. Also, the genus Vibrio was significantly abundant in the intestine, as well as Acinetobacter and Pseudomonas in the hepatopancreas suggesting these taxa as markers for their respective organs independently of the sequenced region. The use of a single hypervariable region such as V3 may be a low-cost alternative that enables an adequate description of the shrimp microbiota, allowing for the development of strategies to continually monitor the microbial communities and detect changes that could indicate susceptibility to pathogens under real aquaculture conditions while the use of the full V3V4 regions can contribute to a more in-depth characterization of the microbial composition.
Collapse
|
195
|
Occam’s Razor for Big Data? On Detecting Quality in Large Unstructured Datasets. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9153065] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Detecting quality in large unstructured datasets requires capacities far beyond the limits of human perception and communicability and, as a result, there is an emerging trend towards increasingly complex analytic solutions in data science to cope with this problem. This new trend towards analytic complexity represents a severe challenge for the principle of parsimony (Occam’s razor) in science. This review article combines insight from various domains such as physics, computational science, data engineering, and cognitive science to review the specific properties of big data. Problems for detecting data quality without losing the principle of parsimony are then highlighted on the basis of specific examples. Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time, and meaning can be extracted rapidly from large sets of unstructured image or video data parsimoniously through relatively simple unsupervised machine learning algorithms. Why we still massively lack in expertise for exploiting big data wisely to extract relevant information for specific tasks, recognize patterns and generate new information, or simply store and further process large amounts of sensor data is then reviewed, and examples illustrating why we need subjective views and pragmatic methods to analyze big data contents are brought forward. The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics, and the development of increasingly autonomous artificial intelligence (AI) aimed at coping with the big data deluge in the near future.
Collapse
|
196
|
Bai F, Hong D, Lu Y, Liu H, Xu C, Yao X. Prediction of the Antioxidant Response Elements' Response of Compound by Deep Learning. Front Chem 2019; 7:385. [PMID: 31214568 PMCID: PMC6554289 DOI: 10.3389/fchem.2019.00385] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 05/14/2019] [Indexed: 11/13/2022] Open
Abstract
The antioxidant response elements (AREs) play a significant role in occurrence of oxidative stress and may cause multitudinous toxicity effects in the pathogenesis of a variety of diseases. Determining if one compound can activate AREs is crucial for the assessment of potential risk of compound. Here, a series of predictive models by applying multiple deep learning algorithms including deep neural networks (DNN), convolution neural networks (CNN), recurrent neural networks (RNN), and highway networks (HN) were constructed and validated based on Tox21 challenge dataset and applied to predict whether the compounds are the activators or inactivators of AREs. The built models were evaluated by various of statistical parameters, such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and receiver operating characteristic (ROC) curve. The DNN prediction model based on fingerprint features has best prediction ability, with accuracy of 0.992, 0.914, and 0.917 for the training set, test set, and validation set, respectively. Consequently, these robust models can be adopted to predict the ARE response of molecules fast and accurately, which is of great significance for the evaluation of safety of compounds in the process of drug discovery and development.
Collapse
Affiliation(s)
- Fang Bai
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Ding Hong
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Yingying Lu
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Cunlu Xu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Xiaojun Yao
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| |
Collapse
|
197
|
Amabilino S, Bratholm LA, Bennie SJ, Vaucher AC, Reiher M, Glowacki DR. Training Neural Nets To Learn Reactive Potential Energy Surfaces Using Interactive Quantum Chemistry in Virtual Reality. J Phys Chem A 2019; 123:4486-4499. [DOI: 10.1021/acs.jpca.9b01006] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Bristol BS8 1TS, U.K
| | - Lars A. Bratholm
- School of Chemistry, University of Bristol, Bristol BS8 1TS, U.K
| | - Simon J. Bennie
- School of Chemistry, University of Bristol, Bristol BS8 1TS, U.K
| | - Alain C. Vaucher
- Laboratory of Physical Chemistry, ETH Zurich, Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Zurich, Switzerland
| | | |
Collapse
|
198
|
A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data. Nat Genet 2018; 50:1735-1743. [PMID: 30397337 DOI: 10.1038/s41588-018-0257-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 09/14/2018] [Indexed: 12/18/2022]
Abstract
Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.
Collapse
|
199
|
|
200
|
Iebba V, Guerrieri F, Di Gregorio V, Levrero M, Gagliardi A, Santangelo F, Sobolev AP, Circi S, Giannelli V, Mannina L, Schippa S, Merli M. Combining amplicon sequencing and metabolomics in cirrhotic patients highlights distinctive microbiota features involved in bacterial translocation, systemic inflammation and hepatic encephalopathy. Sci Rep 2018; 8:8210. [PMID: 29844325 PMCID: PMC5974022 DOI: 10.1038/s41598-018-26509-y] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 05/09/2018] [Indexed: 12/13/2022] Open
Abstract
In liver cirrhosis (LC), impaired intestinal functions lead to dysbiosis and possible bacterial translocation (BT). Bacteria or their byproducts within the bloodstream can thus play a role in systemic inflammation and hepatic encephalopathy (HE). We combined 16S sequencing, NMR metabolomics and network analysis to describe the interrelationships of members of the microbiota in LC biopsies, faeces, peripheral/portal blood and faecal metabolites with clinical parameters. LC faeces and biopsies showed marked dysbiosis with a heightened proportion of Enterobacteriaceae. Our approach showed impaired faecal bacterial metabolism of short-chain fatty acids (SCFAs) and carbon/methane sources in LC, along with an enhanced stress-related response. Sixteen species, mainly belonging to the Proteobacteria phylum, were shared between LC peripheral and portal blood and were functionally linked to iron metabolism. Faecal Enterobacteriaceae and trimethylamine were positively correlated with blood proinflammatory cytokines, while Ruminococcaceae and SCFAs played a protective role. Within the peripheral blood and faeces, certain species (Stenotrophomonas pavanii, Methylobacterium extorquens) and metabolites (methanol, threonine) were positively related to HE. Cirrhotic patients thus harbour a 'functional dysbiosis' in the faeces and peripheral/portal blood, with specific keystone species and metabolites related to clinical markers of systemic inflammation and HE.
Collapse
Affiliation(s)
- Valerio Iebba
- Istituto Pasteur Cenci Bolognetti Foundation, Public Health and Infectious Diseases Department, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Francesca Guerrieri
- Center for Life NanoScience@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy
| | - Vincenza Di Gregorio
- Gastroenterology, Department of Clinical Medicine, Sapienza University of Rome, Viale dell'Università 37, 00185, Rome, Italy
| | - Massimo Levrero
- Center for Life NanoScience@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy
- INSERM, U1052, Cancer Research Center of Lyon (CRCL), Université de Lyon (UCBL1), Centre Léon Bérard, Lyon, France
| | - Antonella Gagliardi
- Public Health and Infectious Diseases Department, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Floriana Santangelo
- Public Health and Infectious Diseases Department, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Anatoly P Sobolev
- Department of Drug Chemistry and Technologies, Sapienza University of Rome, Piazzale Aldo Moro 5, I-00185, Rome, Italy
- Magnetic Resonance Laboratory "Annalaura Segre", Institute of Chemical Methodologies, CNR, via Salaria km 29.300, 00015, Monterotondo, (RM), Italy
| | - Simone Circi
- Department of Drug Chemistry and Technologies, Sapienza University of Rome, Piazzale Aldo Moro 5, I-00185, Rome, Italy
| | - Valerio Giannelli
- Gastroenterology, Department of Clinical Medicine, Sapienza University of Rome, Viale dell'Università 37, 00185, Rome, Italy
| | - Luisa Mannina
- Department of Drug Chemistry and Technologies, Sapienza University of Rome, Piazzale Aldo Moro 5, I-00185, Rome, Italy
- Magnetic Resonance Laboratory "Annalaura Segre", Institute of Chemical Methodologies, CNR, via Salaria km 29.300, 00015, Monterotondo, (RM), Italy
| | - Serena Schippa
- Public Health and Infectious Diseases Department, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Manuela Merli
- Gastroenterology, Department of Clinical Medicine, Sapienza University of Rome, Viale dell'Università 37, 00185, Rome, Italy.
| |
Collapse
|