26
|
Elhadd T, Mall R, Bashir M, Palotti J, Fernandez-Luque L, Farooq F, Mohanadi DA, Dabbous Z, Malik RA, Abou-Samra AB. Artificial Intelligence (AI) based machine learning models predict glucose variability and hypoglycaemia risk in patients with type 2 diabetes on a multiple drug regimen who fast during ramadan (The PROFAST - IT Ramadan study). Diabetes Res Clin Pract 2020; 169:108388. [PMID: 32858096 DOI: 10.1016/j.diabres.2020.108388] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 08/19/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVE To develop a machine-based algorithm from clinical and demographic data, physical activity and glucose variability to predict hyperglycaemic and hypoglycaemic excursions in patients with type 2 diabetes on multiple glucose lowering therapies who fast during Ramadan. PATIENTS AND METHODS Thirteen patients (10 males and three females) with type 2 diabetes on 3 or more anti-diabetic medications were studied with a Fitbit-2 pedometer device and Freestyle Libre (Abbott Diagnostics) 2 weeks before and 2 weeks during Ramadan. Several machine learning techniques were trained to predict blood glucose levels in a regression framework utilising physical activity and contemporaneous blood glucose levels, comparing Ramadan to non-Ramadan days. RESULTS The median age of participants was 51 years (IQR 49-52); median BMI was 33.2 kg/m2 (IQR 33.0-35.9) and median HbA1c was 7.3% (IQR 6.7-7.8). The optimal model using physical activity achieved an R2 of 0.548 and a mean absolute error (MAE) of 30.30. The addition of electronic health record (ehr) information increased R2 to 0.636 and reduced MAE to 26.89 and the time of the day feature further increased R2 to 0.768 and reduced MAE to 20.55. Combining all the features together resulted in an optimal XGBoost model with an R2 of 0.836 and MAE of 17.47. This model accurately estimated normal glucose levels in 2584/2715 (95.2%) readings and hyperglycaemic events in 852/1031 (82.6%) readings, but fewer hypoglycaemic events (48/172 (27.9%)). The optimal XGBoost model prioritized age, gender, BMI and HbA1c followed by glucose levels and physical activity. Interestingly, the blood glucose level prediction by our model was influenced by use of SGLT2i. CONCLUSION XGBoost, a machine learning AI algorithm achieves high predictive performance for normal and hyperglycaemic excursions, but has limited predictive value for hypoglycaemia in patients on multiple therapies who fast during Ramadan.
Collapse
|
27
|
Elbasir A, Mall R, Kunji K, Rawi R, Islam Z, Chuang GY, Kolatkar PR, Bensmail H. BCrystal: an interpretable sequence-based protein crystallization predictor. Bioinformatics 2020; 36:1429-1438. [PMID: 31603511 DOI: 10.1093/bioinformatics/btz762] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 09/19/2019] [Accepted: 10/08/2019] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION X-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization. RESULTS In this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew's correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew's correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability. AVAILABILITY AND IMPLEMENTATION Our BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
28
|
Roelands J, Hendrickx W, Zoppoli G, Mall R, Saad M, Halliwill K, Curigliano G, Rinchai D, Decock J, Delogu LG, Turan T, Samayoa J, Chouchane L, Ballestrero A, Wang E, Finetti P, Bertucci F, Miller LD, Galon J, Marincola FM, Kuppen PJK, Ceccarelli M, Bedognetti D. Oncogenic states dictate the prognostic and predictive connotations of intratumoral immune response. J Immunother Cancer 2020; 8:e000617. [PMID: 32376723 PMCID: PMC7223637 DOI: 10.1136/jitc-2020-000617] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/24/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND An immune active cancer phenotype typified by a T helper 1 (Th-1) immune response has been associated with increased responsiveness to immunotherapy and favorable prognosis in some but not all cancer types. The reason of this differential prognostic connotation remains unknown. METHODS To explore the contextual prognostic value of cancer immune phenotypes, we applied a multimodal pan-cancer analysis among 31 different histologies (9282 patients), encompassing immune and oncogenic transcriptomic analysis, mutational and neoantigen load and copy number variations. RESULTS We demonstrated that the favorable prognostic connotation conferred by the presence of a Th-1 immune response was abolished in tumors displaying specific tumor-cell intrinsic attributes such as high transforming growth factor-beta (TGF-β) signaling and low proliferation capacity. This observation was independent of mutation rate. We validated this observation in the context of immune checkpoint inhibition. WNT-β catenin, barrier molecules, Notch, hedgehog, mismatch repair, telomerase activity and AMPK signaling were the pathways most coherently associated with an immune silent phenotype together with mutations of driver genes including IDH1/2, FOXA2, HDAC3, PSIP1, MAP3K1, KRAS, NRAS, EGFR, FGFR3, WNT5A and IRF7. CONCLUSIONS This is the first systematic study demonstrating that the prognostic and predictive role of a bona fide favorable intratumoral immune response is dependent on the disposition of specific oncogenic pathways. This information could be used to refine stratification algorithms and prioritize hierarchically relevant targets for combination therapies.
Collapse
|
29
|
Perez-Pozuelo I, Zhai B, Palotti J, Mall R, Aupetit M, Garcia-Gomez JM, Taheri S, Guan Y, Fernandez-Luque L. The future of sleep health: a data-driven revolution in sleep science and medicine. NPJ Digit Med 2020; 3:42. [PMID: 32219183 PMCID: PMC7089984 DOI: 10.1038/s41746-020-0244-4] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 02/18/2020] [Indexed: 01/04/2023] Open
Abstract
In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applications ranging from large-scale epidemiological research linking sleep patterns to disease, to wellness applications, including the sleep coaching of individuals with chronic conditions. However, in order to realise the full potential of these technologies for individuals, medicine and research, several significant challenges must be overcome. There are important outstanding questions regarding performance evaluation, as well as data storage, curation, processing, integration, modelling and interpretation. Here, we leverage expertise across neuroscience, clinical medicine, bioengineering, electrical engineering, epidemiology, computer science, mHealth and human-computer interaction to discuss the digitisation of sleep from a inter-disciplinary perspective. We introduce the state-of-the-art in sleep-monitoring technologies, and discuss the opportunities and challenges from data acquisition to the eventual application of insights in clinical and consumer settings. Further, we explore the strengths and limitations of current and emerging sensing methods with a particular focus on novel data-driven technologies, such as Artificial Intelligence.
Collapse
|
30
|
Islam Z, Ali MH, Popelka A, Mall R, Ullah E, Ponraj J, Kolatkar PR. Probing the fibrillation of lysozyme by nanoscale-infrared spectroscopy. J Biomol Struct Dyn 2020; 39:1481-1490. [PMID: 32131712 DOI: 10.1080/07391102.2020.1734091] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Amyloid fibrillation is the root cause of several neuro as well as non-neurological disorders. Understanding the molecular basis of amyloid aggregate formation is crucial for deciphering various neurodegenerative diseases. In our study, we have examined the lysozyme fibrillation process using nano-infrared spectroscopy (nanoIR). NanoIR enabled us to investigate both structural and chemical characteristics of lysozyme fibrillar species concurrently. The spectroscopic results indicate that lysozyme transformed into a fibrillar structure having mainly parallel β-sheets, with almost no antiparallel β-sheets. Features such as protein stiffness have a good correlation with obtained secondary structural information showing the state of the protein within the fibrillation state. The structural and chemical details were compared with transmission electron microscopy (TEM) and circular dichroism (CD). We have utilized nanoIR and measured infrared spectra to characterize lysozyme amyloid fibril structures in terms of morphology, molecular structure, secondary structure content, stability, and size of the cross-β core. We have shown that the use of nanoIR can complement other biophysical studies to analyze the aggregation process and is particularly useful for studying proteins involved in aggregation to help in designing molecules against amyloid aggregation. Specifically, the nanoIR spectra afford higher resolution information and a characteristic fingerprint for determining states of aggregation.Communicated by Ramaswamy H. Sarma.
Collapse
|
31
|
Ali MHM, Toor SM, Rakib F, Mall R, Ullah E, Mroue K, Kolatkar PR, Al-Saad K, Elkord E. Investigation of the Effect of PD-L1 Blockade on Triple Negative Breast Cancer Cells Using Fourier Transform Infrared Spectroscopy. Vaccines (Basel) 2019; 7:vaccines7030109. [PMID: 31505846 PMCID: PMC6789440 DOI: 10.3390/vaccines7030109] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 08/21/2019] [Accepted: 09/03/2019] [Indexed: 12/24/2022] Open
Abstract
Interactions between programmed death-1 (PD-1) with its ligand PD-L1 on tumor cells can antagonize T cell responses. Inhibiting these interactions using immune checkpoint inhibitors has shown promise in cancer immunotherapy. MDA-MB-231 is a triple negative breast cancer cell line that expresses PD-L1. In this study, we investigated the biochemical changes in MDA-MB-231 cells following treatment with atezolizumab, a specific PD-L1 blocker. Our readouts were Fourier Transform Infrared (FTIR) spectroscopy and flow cytometric analyses. Chemometrical analysis, such as principal component analysis (PCA), was applied to delineate the spectral differences. We were able to identify the chemical alterations in both protein and lipid structure of the treated cells. We found that there was a shift from random coil and α-helical structure to β-sheet conformation of PD-L1 on tumor cells due to atezolizumab treatment, which could hinder binding with its receptors on immune cells, ensuring sustained T cell activation for potent immune responses. This work provides novel information about the effects of atezolizumab at molecular and cellular levels. FTIR bio-spectroscopy, in combination with chemometric analyses, may expedite research and offer new approaches for cancer immunology.
Collapse
|
32
|
Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2019; 34:2605-2613. [PMID: 29554211 DOI: 10.1093/bioinformatics/bty166] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 03/13/2018] [Indexed: 01/09/2023] Open
Abstract
Motivation Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
33
|
Park H, Mall R, Alharbi FH, Sanvito S, Tabet N, Bensmail H, El-Mellouhi F. Learn-and-Match Molecular Cations for Perovskites. J Phys Chem A 2019; 123:7323-7334. [DOI: 10.1021/acs.jpca.9b06208] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
34
|
Palotti J, Mall R, Aupetit M, Rueschman M, Singh M, Sathyanarayana A, Taheri S, Fernandez-Luque L. Benchmark on a large cohort for sleep-wake classification with machine learning techniques. NPJ Digit Med 2019; 2:50. [PMID: 31304396 PMCID: PMC6555808 DOI: 10.1038/s41746-019-0126-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/06/2019] [Indexed: 11/17/2022] Open
Abstract
Accurately measuring sleep and its quality with polysomnography (PSG) is an expensive task. Actigraphy, an alternative, has been proven cheap and relatively accurate. However, the largest experiments conducted to date, have had only hundreds of participants. In this work, we processed the data of the recently published Multi-Ethnic Study of Atherosclerosis (MESA) Sleep study to have both PSG and actigraphy data synchronized. We propose the adoption of this publicly available large dataset, which is at least one order of magnitude larger than any other dataset, to systematically compare existing methods for the detection of sleep-wake stages, thus fostering the creation of new algorithms. We also implemented and compared state-of-the-art methods to score sleep-wake stages, which range from the widely used traditional algorithms to recent machine learning approaches. We identified among the traditional algorithms, two approaches that perform better than the algorithm implemented by the actigraphy device used in the MESA Sleep experiments. The performance, in regards to accuracy and F 1 score of the machine learning algorithms, was also superior to the device's native algorithm and comparable to human annotation. Future research in developing new sleep-wake scoring algorithms, in particular, machine learning approaches, will be highly facilitated by the cohort used here. We exemplify this potential by showing that two particular deep-learning architectures, CNN and LSTM, among the many recently created, can achieve accuracy scores significantly higher than other methods for the same tasks.
Collapse
|
35
|
Park H, Mall R, Alharbi FH, Sanvito S, Tabet N, Bensmail H, El-Mellouhi F. Correction: Exploring new approaches towards the formability of mixed-ion perovskites by DFT and machine learning. Phys Chem Chem Phys 2019; 21:2821. [PMID: 30657154 DOI: 10.1039/c9cp90013f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Correction for 'Exploring new approaches towards the formability of mixed-ion perovskites by DFT and machine learning' by Heesoo Park et al., Phys. Chem. Chem. Phys., 2019, DOI: 10.1039/c8cp06528d.
Collapse
|
36
|
Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 2019; 34:1092-1098. [PMID: 29069295 DOI: 10.1093/bioinformatics/btx662] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought. Results In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact gwo-yu.chuang@nih.gov. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
37
|
Park H, Mall R, Alharbi FH, Sanvito S, Tabet N, Bensmail H, El-Mellouhi F. Exploring new approaches towards the formability of mixed-ion perovskites by DFT and machine learning. Phys Chem Chem Phys 2019; 21:1078-1088. [DOI: 10.1039/c8cp06528d] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Recent years have witnessed a growing effort in engineering and tuning the properties of hybrid halide perovskites as light absorbers.
Collapse
|
38
|
Ullah E, Mall R, Abbas MM, Kunji K, Nato AQ, Bensmail H, Wijsman EM, Saad M. Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. Genome Res 2018; 29:125-134. [PMID: 30514702 PMCID: PMC6314157 DOI: 10.1101/gr.236315.118] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 11/30/2018] [Indexed: 01/19/2023]
Abstract
Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.
Collapse
|
39
|
Elbasir A, Moovarkumudalvan B, Kunji K, Kolatkar PR, Mall R, Bensmail H. DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction. Bioinformatics 2018; 35:2216-2225. [DOI: 10.1093/bioinformatics/bty953] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 10/31/2018] [Accepted: 11/17/2018] [Indexed: 12/11/2022] Open
Abstract
Abstract
Motivation
Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not.
Results
Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew’s correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets.
Availability and implementation
The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
40
|
Ullah E, Mall R, Rawi R, Moustaid-Moussa N, Butt AA, Bensmail H. Correction to: Harnessing Qatar Biobank to understand type 2 diabetes and obesity in adult Qataris from the First Qatar Biobank Project. J Transl Med 2018; 16:283. [PMID: 30322395 PMCID: PMC6190666 DOI: 10.1186/s12967-018-1648-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 09/27/2018] [Indexed: 11/24/2022] Open
|
41
|
Ali MHM, Rakib F, Abdelalim EM, Limbeck A, Mall R, Ullah E, Mesaeli N, McNaughton D, Ahmed T, Al-Saad K. Fourier-Transform Infrared Imaging Spectroscopy and Laser Ablation -ICPMS New Vistas for Biochemical Analyses of Ischemic Stroke in Rat Brain. Front Neurosci 2018; 12:647. [PMID: 30283295 PMCID: PMC6157330 DOI: 10.3389/fnins.2018.00647] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Accepted: 08/30/2018] [Indexed: 12/13/2022] Open
Abstract
Objective: Stroke is the main cause of adult disability in the world, leaving more than half of the patients dependent on daily assistance. Understanding the post-stroke biochemical and molecular changes are critical for patient survival and stroke management. The aim of this work was to investigate the photo-thrombotic ischemic stroke in male rats with particular focus on biochemical and elemental changes in the primary stroke lesion in the somatosensory cortex and surrounding areas, including the corpus callosum. Materials and Methods: FT-IR imaging spectroscopy and LA-ICPMS techniques examined stroke brain samples, which were compared with standard immunohistochemistry studies. Results: The FTIR results revealed that in the lesioned gray matter the relative distribution of lipid, lipid acyl and protein contents decreased significantly. Also at this locus, there was a significant increase in aggregated protein as detected by high-levels Aβ1-42. Areas close to the stroke focus experienced decrease in the lipid and lipid acyl contents associated with an increase in lipid ester, olefin, and methyl bio-contents with a novel finding of Aβ1-42 in the PL-GM and L-WM. Elemental analyses realized major changes in the different brain structures that may underscore functionality. Conclusion: In conclusion, FTIR bio-spectroscopy is a non-destructive, rapid, and a refined technique to characterize oxidative stress markers associated with lipid degradation and protein denaturation not characterized by routine approaches. This technique may expedite research into stroke and offer new approaches for neurodegenerative disorders. The results suggest that a good therapeutic strategy should include a mechanism that provides protective effect from brain swelling (edema) and neurotoxicity by scavenging the lipid peroxidation end products.
Collapse
|
42
|
Mall R, Cerulo L, Garofano L, Frattini V, Kunji K, Bensmail H, Sabedot TS, Noushmehr H, Lasorella A, Iavarone A, Ceccarelli M. RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. Nucleic Acids Res 2018; 46:e39. [PMID: 29361062 PMCID: PMC6283452 DOI: 10.1093/nar/gky015] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 01/06/2018] [Indexed: 01/05/2023] Open
Abstract
We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.
Collapse
|
43
|
Ullah E, Mall R, Rawi R, Moustaid-Moussa N, Butt AA, Bensmail H. Harnessing Qatar Biobank to understand type 2 diabetes and obesity in adult Qataris from the First Qatar Biobank Project. J Transl Med 2018; 16:99. [PMID: 29650030 PMCID: PMC5898076 DOI: 10.1186/s12967-018-1472-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/04/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Human tissues are invaluable resources for researchers worldwide. Biobanks are repositories of such human tissues and can have a strategic importance for genetic research, clinical care, and future discoveries and treatments. One of the aims of Qatar Biobank is to improve the understanding and treatment of common diseases afflicting Qatari population such as obesity and diabetes. METHODS In this study we apply a panorama of state-of-the-art statistical methods and machine learning algorithms to investigate associations and risk factors for diabetes and obesity on a sample of 1000 Qatari population. RESULTS Regarding diabetes, we identified pronounced associations and risk factors in Qatari population including magnesium, chloride, c-peptide of insulin, insulin, and uric acid. Similarly, for obesity, significant associations and risk factors include insulin, c-peptide of insulin, albumin, and uric acid. Moreover, our study has revealed interactions of hypomagnesemia with HDL-C, triglycerides, and free thyroxine. CONCLUSIONS Our study strongly confirms known associations and risk factors associated with diabetes and obesity in Qatari population as previously found in other population studies in different parts of the world. Moreover, interactions of hypomagnesemia with other associations and risk factors merit further investigations.
Collapse
|
44
|
Mall R, Cerulo L, Bensmail H, Iavarone A, Ceccarelli M. Detection of statistically significant network changes in complex biological networks. BMC SYSTEMS BIOLOGY 2017; 11:32. [PMID: 28259158 PMCID: PMC5336651 DOI: 10.1186/s12918-017-0412-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 02/22/2017] [Indexed: 01/10/2023]
Abstract
Background Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localized re-wiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to the discovery of novel relevant signatures. Several procedures have been recently proposed to detect sub-network differences in pairwise labeled weighted networks. Methods In this paper, we propose an improvement over the state-of-the-art based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate p-values for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation. Results In the case of dense random geometric networks the proposed approach is 10-15x faster and achieves 5-10% higher AUC, Precision/Recall, and Kappa value than the state-of-the-art. We also report the application of the method to dissect the difference between the regulatory networks of IDH-mutant versus IDH-wild-type glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates. Conclusions We show that our network differencing procedure can effectively and efficiently detect statistical significant network re-wirings in different conditions. When applied to detect the main differences between the networks of IDH-mutant and IDH-wild-type glioma tumors, it correctly selects sub-networks centered on important key regulators of these two different subtypes. In addition, its application highlights several novel candidates that cannot be detected by standard single network-based approaches. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0412-6) contains supplementary material, which is available to authorized users.
Collapse
|
45
|
Rawi R, Mall R, Kunji K, El Anbari M, Aupetit M, Ullah E, Bensmail H. COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator. BMC Bioinformatics 2016; 17:533. [PMID: 27978812 PMCID: PMC5159955 DOI: 10.1186/s12859-016-1400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/01/2016] [Indexed: 11/13/2022] Open
Abstract
Background The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Results Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. Conclusion We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.
Collapse
|
46
|
Mall R, Suykens JAK. Very sparse LSSVM reductions for large-scale data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1086-1097. [PMID: 25751875 DOI: 10.1109/tnnls.2014.2333879] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Least squares support vector machines (LSSVMs) have been widely applied for classification and regression with comparable performance with SVMs. The LSSVM model lacks sparsity and is unable to handle large-scale data due to computational and memory constraints. A primal fixed-size LSSVM (PFS-LSSVM) introduce sparsity using Nyström approximation with a set of prototype vectors (PVs). The PFS-LSSVM model solves an overdetermined system of linear equations in the primal. However, this solution is not the sparsest. We investigate the sparsity-error tradeoff by introducing a second level of sparsity. This is done by means of L0 -norm-based reductions by iteratively sparsifying LSSVM and PFS-LSSVM models. The exact choice of the cardinality for the initial PV set is not important then as the final model is highly sparse. The proposed method overcomes the problem of memory constraints and high computational costs resulting in highly sparse reductions to LSSVM models. The approximations of the two models allow to scale the models to large-scale datasets. Experiments on real-world classification and regression data sets from the UCI repository illustrate that these approaches achieve sparse models without a significant tradeoff in errors.
Collapse
|
47
|
Huber M, Falkenberg N, Gross E, Mall R, Braselmann H, Schmitt M, Aubele M. The impact of the uPAR system and its interaction partners as potential therapeutic targets in TNBC. Ann Oncol 2015. [DOI: 10.1093/annonc/mdv117.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
48
|
Mehrkanoon S, Alzate C, Mall R, Langone R, Suykens JAK. Multiclass semisupervised learning based upon kernel spectral clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:720-733. [PMID: 25794378 DOI: 10.1109/tnnls.2014.2322377] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper proposes a multiclass semisupervised learning algorithm by using kernel spectral clustering (KSC) as a core model. A regularized KSC is formulated to estimate the class memberships of data points in a semisupervised setting using the one-versus-all strategy while both labeled and unlabeled data points are present in the learning process. The propagation of the labels to a large amount of unlabeled data points is achieved by adding the regularization terms to the cost function of the KSC formulation. In other words, imposing the regularization term enforces certain desired memberships. The model is then obtained by solving a linear system in the dual. Furthermore, the optimal embedding dimension is designed for semisupervised clustering. This plays a key role when one deals with a large number of clusters.
Collapse
|
49
|
Mall R, Mehrkanoon S, Suykens JA. Identifying intervals for hierarchical clustering using the Gershgorin circle theorem. Pattern Recognit Lett 2015. [DOI: 10.1016/j.patrec.2014.12.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
50
|
Huber M, Falkenberg N, Schmitt M, Braselmann H, Mall R, Jakovac M, Walch A, Höfler H, Aubele M. 673: uPAR and its interaction partners: potential new therapy targets in triple negative breast cancer. Eur J Cancer 2014. [DOI: 10.1016/s0959-8049(14)50593-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|